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And he brought him forth abroad, and said, 
Look now toward heaven, and tell the stars, if 
thou be able to number them: and he said unto 
him, So shall thy seed be. 


Genesis 15, verse 5 
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Introduction 


This book is written from the perspective of several passionately held beliefs 
about mathematical education. The first is that mathematics is a good story. 
Theorems are not discovered in isolation, but happen as part of a culture, and 
they are generally motivated by paradigms. In this book we are going to show 
how one result from antiquity can be used to illuminate the study of much 
that forms the undergraduate curriculum in number theory at a typical U.K. 
university. The result is the Fundamental Theorem of Arithmetic. Our hope 
is that students will understand that number theory is not just a collection of 
tricks and isolated results but has a coherence fueled directly by a connected 
narrative that spans centuries. 

The second belief is that mathematics students (and indeed professional 
mathematicians) come to the subject with different preferences and evolving 
strengths. Therefore, we have endeavored to present differing approaches to 
number theory. One way to achieve this is the obvious one of selecting ma- 
terial from both the algebraic and the analytic disciplines. Less obviously, in 
the early part of the book particularly, we sometimes present several different 
proofs of a single result. The aim is to try to capture the imagination of the 
reader and help her or him to discover his or her own taste in mathematics. 
The book is written under the assumption that students are being exposed 
to the power of analysis in courses such as complex variables, as well as the 
power of abstraction in courses such as algebra. Thus we use notions from 
finite group theory at several points to give alternative proofs. Often the re- 
sulting approaches simplify and promote generalization, as well as providing 
elegance. We also use this approach because we want to try to explain how 
different approaches to elementary results are worked out later in different 
approaches to the subject in general. Thus Euler’s proof of the Fundamental 
Theorem of Arithmetic could be taken to prefigure the development of analytic 
number theory with its ingenious use of the Euler product Formula. When we 
move further into the analytic aspects of arithmetic, Euler’s relatively simple 
observation may seem like a rather flimsy pretext. However, the view that 
many nineteenth-century mathematicians took of functions (complex func- 
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tions particularly) was profoundly influenced by the Fundamental Theorem 
of Arithmetic. In their view, many functions are factorizable objects, and we 
will try to illustrate this in describing some of the great achievements of that 
century. 

Having spoken of different approaches, it will surprise few readers that 
number theory has many streams. A major surprise is the fact that some 
of these meet again: Chapter 11 shows that many of the themes in Chap- 
ters 1-10 become reconciled further on. The classical class number formula 
reconciles the analytic stream of ideas with the algebraic. We also discuss — 
necessarily in general terms — the Z-function associated with an elliptic curve 
and the conjectures of Birch and Swinnerton-Dyer, which draw together the 
elliptic, algebraic and analytic streams. The underlying motif is the theory 
of L-functions. As we enter a new millennium, it has become clear that one 
of the ways into the deepest parts of number theory requires a better under- 
standing of these fundamental objects. 

The third belief is that number theory is a living subject, even when stud- 
ied at an elementary level. The onset of electronic computing gave the subject 
an enormous boost, and it is a pleasure to be able to record some recent devel- 
opments. The language of arithmetical complexity has helped to change the 
way we think about numbers. Modern computers can carry out calculations 
with numbers that are almost unimaginably large. We recommend that any 
reader unfamiliar with modern number theory packages tries a few experi- 
ments using some of the excellent free software available from the internet. To 
start to think of the issues raised by large integer calculation can be no bad 
thing. Intellectually too, this computational topic illustrates an interesting 
point about the enduring nature of the paradigm. Our story begins over two 
millennia ago, yet it is the same questions that continue to fascinate us. What 
are the primes like? Where can they be found? How can the prime factors of 
an integer be computed? Whether these questions will endure awhile longer 
nobody can tell. The history of these problems already presents a fascinating 
story worth telling, and one that says a lot about one of the most important 
and beautiful narratives of enquiry in human history — mathematics. 

One of the most striking and pleasurable aspects of number theory is the 
extent of time and range of cultures over which it has been studied. We do 
not go into a detailed history of the developments described here, but the 
names and places given in the list of “Dramatis Personae” should give some 
idea of how widely number theory has been studied. The names in this list are 
rather crudely Anglicized and the locations somewhat arbitrarily modernized. 
The many living mathematicians who have made significant contributions to 
the topics covered here have been omitted but may be found on the Web 
site in [113]. A densely written, comprehensive review of number theory up 
to about 1920 may be found in Dickson’s history [42], [43], [44]; a discursive 
and masterly account of the four millennia ending in 1798 is provided by 
Weil [157]. 
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Finally, we say something about the way this book could be used. It is 
based on three courses taught at the University of East Anglia on various 
aspects of number theory (analytic, algebraic/geometric, and computational), 
mostly at the final-year undergraduate level. We were motivated in part by 
G. A. and J. M. Jones’ attractive book [84]. Their book sets out to deal with 
the subject as it is actually taught. Typically, third-year students will not 
have done a course in number theory and their experience will necessarily 
be fragmentary. Like [84], our book begins in quite an elementary way. We 
have found that the different years at a university do not equate neatly with 
different abilities: Students in their early years can often be stretched well 
beyond what seems possible, and upper-level students do not complain about 
beginning in simple ways. We will try to show how different chapters can 
be put together to make a course; the book can be used as a basis for two 
upper-level courses and one at an intermediate level. 

We thank many people for contributing to this text. Notable among them 
are Christian Rottger, for writing up notes from an analytic number theory 
course at UEA; Sanju Velani, for making available notes from his analytic 
number theory course; several cohorts of UEA undergraduates for feedback on 
lecture courses; Neal Koblitz and Joe Silverman for their inspiring books; and 
Elena Nardi for help with the ancient Greek in Section 1.7.1. We thank Karim 
Belabas, Robin Chapman, Sue Everest, Gareth and Mary Jones, Graham 
Norton, David Pierce, Peter Pleasants, Christian Rottger, Alice Silverberg, 
Shaun Stevens, Alan and Honor Ward, and others for pointing out errors and 
suggesting improvements. Errors and solecisms that remain are entirely the 
authors’ responsibility. 


February 14, 2005 Graham Everest 
Norwich, UK Thomas Ward 


NOTATION AND TERMINOLOGY 


“Arithmetic” is used both as a noun and an adjective. The particular nota- 
tion used is collected at the start of the index. The symbols N, P, Z, Q, R, C 
denote the natural numbers {1,2,3,...}, prime numbers {2,3,5,7,...}, in- 
tegers, rational numbers, real numbers, and complex numbers, respectively. 
Any field with g = p” elements, p € P and r € N, is denoted Fy, and Fj 
denotes its multiplicative group; the field F,, p € P, is identified with the 
set {0,1,...,p— 1} under addition and multiplication modulo p. For a com- 
plex number s = o + it, R(s) = 0 and S(s) = t denote the real and imaginary 
parts of s respectively. The symbol | means “divides”, so for a,b € Z, alb if 
there is an integer k with ak = b. For any set X, |X| denotes the cardinality 
of X. The greatest common divisor of a and b is written gcd(a, b). Products 
are written using - as in 12 = 3-4 or n! = 1-2---(n—1)-n. The order 
of growth of functions f,g (usually these are functions N + R) is compared 
using the following notation: 
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frgif F(a) — lasrt—>o; 
g(a) 
f = O(g) if there is a constant A > 0 with f(x) < Ag(x) for all 2; 
f = 0(g) if A(z) —dasxrxo. 
g(a) 


In particular, f = O(1) means that f is bounded. The relation f = O(g) will 
also be written f < g, particularly when it is being used to express the fact 


that two functions are commensurate, f<g<« f. A sequence aj, a2,... will 
be denoted (a,). 


REFERENCES 


The references are not comprehensive, and material that is not explicitly cited 
is nonetheless well-known. It is inevitable that we have borrowed ideas and 
used them inadvertently without citation; we apologize for any egregious in- 
stances of this. The general references that are likely to be most accessible 
without much background are as follows. For Chapter 2, [147]; for Chapters 3 
and 4, [77], [96], [147], and [154]; for Chapters 5-7, [27] and [143]; for Chap- 
ters 8-10, [4], [75], and [81]; for Chapter 9, [6]; and for Chapter 12, [21], [22], 
[36], [90], and [66]. 


POSSIBLE COURSES 


A course on analytic number theory could follow Chapters 1, 8, 9, and 10; 
one on Diophantine problems or elliptic curves could follow Chapters 1, 2, 5, 
6, and 7. A lower-level course on algebraic number theory could be based on 
Chapters 1, 2, 3 and 4; one on complexity could be based on Chapters 1 and 12. 
(These could also be used for the complexity part of a course on cryptography.) 
The exercises are generally routine applications of the methods in the text, 
but exercises marked * are to be viewed as projects, some of them requiring 
further reading and research. 
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DRAMATIS PERSONAE 


Person Date Country 
Pythagoras of Samos 569 B.c.—475 B.C. Greece, Egypt 
Euclid of Alexandria 325 B.C.-265 B.C. Greece, Egypt 
Eratosthenes of Cyrene 276 B.c.-194 B.c. Libya, Greece, Egypt 
Diophantus of Alexandria 200-284 Greece, Egypt 
Hypatia of Alexandria 370-415 Egypt 

Sun Zi 400-460 China 
Brahmagupta 598-670 India 

Abu Ali al-Hasan ibn al-Haytham 965-1040 Iraq, Egypt 
Bhaskaracharya 1114-1185 India 

Leonardo Pisano Fibonacci 1170-1250 Italy 

Qin Jiushao 1202-1261 China 

Pietro Antonio Cataldi 1548-1626 Italy 

Claude Gaspar Bachet de Méziriac 1581-1638 France 

Marin Mersenne 1588-1648 France 

Pierre de Fermat 1601-1665 France 

James Stirling 1692-1770 Scotland 
Leonhard Euler 1707-1783 Switzerland, Russia 
Joseph—Louis Lagrange 1736-1813 Italy, France 
Lorenzo Mascheroni 1750-1800 Italy, France 
Adrien-Marie Legendre 1752-1833 France 

Jean Baptiste Joseph Fourier 1768-1830 France 

Johann Carl Friedrich Gauss 1777-1855 Germany 

Siméon Denis Poisson 1781-1840 France 

August Ferdinand Mobius 1790-1868 Germany 

Niels Henrik Abel 1802-1829 Norway 

Carl Gustav Jacob Jacobi 1804-1851 Germany 
Johann Peter Gustav Lejeune Dirichlet 1805-1859 France, Germany 
Joseph Liouville 1809-1882 France 

Ernst Eduard Kummer 1810-1893 Germany 
Evariste Galois 1811-1832 France 

Karl Theodor Wilhelm Weierstrass 1815-1897 Germany 
Pafnuty Lvovich Tchebychef 1821-1894 Russia 

Georg Friedrich Bernhard Riemann 1826-1866 Germany, Italy 
Frangois Edouard Anatole Lucas 1842-1891 France 

Jules Henri Poincaré 1854-1912 France 

David Hilbert 1862-1943 Germany 
Srinivasa Aiyangar Ramanujan 1887-1920 India, England 
Louis Joel Mordell 1888-1972 USA, England 
Carl Ludwig Siegel 1896-1981 Germany 

Emil Artin 1898-1962 Austria, Germany 
Kurt Mahler 1903-1988 Germany, UK, Australia 
Derrick Henry Lehmer 1905-1991 USA 


André Weil 1906-1998 France, USA 
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A Brief History of Prime 


Most of the results in this book grow out of one theorem that has probably 
been known in some form since antiquity. 


Theorem 1.1. [FUNDAMENTAL THEOREM OF ARITHMETIC] Every integer 
greater than 1 can be expressed as a product of prime numbers in a way that 
is unique up to order. 


For the moment, we are using the term prime in its most primitive form — 
to mean an irreducible integer greater than one. Thus a positive integer p is 
prime if p > 1 and the factorization p = ab into positive integers implies that 
either a = 1 or b = 1. The expression “up to order” means simply that we 
regard, for example, the two factorizations 6 = 2-3 = 3-2 as the same. 

Theorem 1.1, the Fundamental Theorem of Arithmetic, will reverberate 
throughout the text. The fact that the primes are the building blocks for all 
integers already suggests they are worth particular study, rather in the way 
that scientists study matter at an atomic level. In this case, we need a way of 
looking for primes and methods to construct them, identify them, and even 
quantify their appearance if possible. Some of these quests took thousands of 
years to fulfill, and some are still works in progress. At the end of this chapter, 
we will give a proof of Theorem 1.1, but for now we want to get on with our 
main theme. 


1.1 Euclid and Primes 


The first consequence of the Fundamental Theorem of Arithmetic for the 
primes is that there must be infinitely many of them. 


Theorem 1.2. [EUCLID] There are infinitely many primes. 


To emphasize the diversity of approaches to number theory, we will give 
several proofs of this famous result. 
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EUCLID’S PROOF IN MODERN Form. If there are only finitely many primes, 
we can list them as p1,...,p,;. Let 


N=pi-- pp +1 >. 


By the Fundamental Theorem of Arithmetic, N can be factorized, so it must 
be divisible by some prime px of our list. Since pz also divides p,--- py, it 
must divide the difference 


which is impossible, as pz, > 1. 


EULER’S ANALYTIC PROOF. Assume that there are only finitely many primes, 
so they may be listed as p,...,p,. Consider the product 


r 1 -1 
tC 


The product is finite since 1 is not a prime and by hypothesis there are only 
finitely many primes. Now expand each factor into a convergent geometric 
series, 


1 dh? Mile ol 
ee eg eg ey 
bs PP 


For any fixed kK, we deduce that 


Putting this into the equation for X gives 


t- 2 1 1. 1 
Xe (14p4+gttge) (tpt gto tg) 
(Utet atte) (4+ atte] 
5 5? Sas De pe pe 
ie ee. 
ae ee | 
=e (1.1) 
neN (K) 


where 
N(K) ={n€N|n=pi---pe,e; < K for all i} 


denotes the set of all natural numbers with the property that each prime 
factor appears no more than K times. Notice that the identity (1.1) requires 
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the Fundamental Theorem of Arithmetic. Given any number n €N, if K is 
large enough, then n € N’(K), so we deduce that 


xe > 


The series on the right-hand side (known as the harmonic series) diverges 
to infinity, but X is finite. Again we have reached a contradiction from the 
assumption that there are finitely many primes. 
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Let us recall why the harmonic series diverges to infinity. As with Theo- 
rem 1.2, there are many ways to prove this; the first is elementary, while the 
second compares the series with an integral. 


ELEMENTARY PROOF. Notice that 


eet 
2S 2 
1 1.41 
igre cee 
34° 2 
Eu oe tol 
5 TB? 
and so on. For any k > 1 
1 1 1 1 1 
odes > oF. 
ae 1 ) 242 * Rt Dnt is <p 
This means that 
a k 
S° = 2 = forall k > 1, 
a 1 2 


Co 
and it follows that S- 2 diverges. 
—~n 
Hidden in the last argument is some indication of the rate at which the 
harmonic series diverges. Since the sum of the first 2*t! terms exceeds k/2, 
the sum of the first NV terms must be approximately Clog N for some positive 
constant C’. The second proof improves on this: Equation (1.2) gives a sharper 
lower bound as well as an upper bound. 


Exercise 1.1. Try to prove that —, diverges using the same technique 
y gS g 


of grouping terms together. Of anaes, ae will not work since this series 
converges, but you will see something mildly interesting. In particular, can 
you use this to estimate the sum? 
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UsING THE INTEGRAL TEST. Compare 37_, 4+ with the integral 
ae 
i. — dx = log N. 
1 xv 
Figure 1.1 shows )~°_, 4+ trapped between joa ; dx and 1+ figs = da; in 


general, it follows that 


<1+log N. (1.2) 


"Ste 


log(N +1) < py 


This shows again that the harmonic series diverges and that the partial sum 
of the first N terms is approximately log NV. 


Figure 1.1. Graphs of y = + and y = =; trapping the harmonic series. 


This proof is a harbinger of more subtle results. Comparing series with 
integrals is a powerful technique; more generally, using analytic techniques 
to study properties of numbers has been one of the most important ideas in 
number theory. 


Exercise 1.2. Extend the method illustrated in Figure 1.1 to show that the 
sequence (a,,) defined by 
“1 
an = pa as logn 
m=1 

is decreasing (that is, dn41 < a» for all n) and nonnegative. Deduce that it 
converges to some number y, and estimate y to three digits. This number 
is known as the Euler—Mascheroni constant. It is not known if y is rational, 
although it is expected not to be. 
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1.2 Summing Over the Primes 


We begin this section with yet another proof that there are infinitely many 
primes. Recall that P denotes the set of prime numbers. 


1 
Theorem 1.3. The series oa — diverges. 
peP 


Several proofs are offered; each one provides different insights. We adopt 
the convention that p always denotes a prime so, for example, S- Gp de- 


p>N 
notes ye Ay. 
pEP,p>N 
Notice that Theorem 1.3 tells us something about the sequence (p,,) of 
primes that begins py = 2, po = 3, p3 = 5,.... For example, the se- 


quence (net / Pn) cannot be bounded for any ¢ > 0. 


First PROOF OF THEOREM 1.3. We argue by contradiction: Assume that 
the series converges. Then there is some N such that 


re 1 
p>NnP 2 
Let 
Q= [|p 
PSN 


be the product of all the primes less than or equal to N. The numbers 
1+nQ, neN, 


are never divisible by primes less than N because such primes do divide Q. 
Now consider 


We claim that ; 


because every term on the left-hand side appears on the right-hand side at 
least once. (Convince yourself of this claim by taking N = 11 and finding 
some terms on the right-hand side.) It follows that 


Co 


1 
Se ien0 <1 (1.3) 


n=1 
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However, the series in Equation (1.3) diverges since 


| ee ei 
2d Tn 7 39 Lun 


for any K, and the right-hand side diverges as K — oo. This contradiction 
proves the theorem. 


SECOND PROOF OF THEOREM 1.3. We will prove a stronger result, namely 


1 
S© = > loglog N — 2. (1.4) 
pXN 


Fix N and let 
NN) = {n EN: all prime factors of n are less than or equal to N}. 
Then (just as in Euler’s analytic proof of Theorem 1.2 on p. 8) 


S- - [] G@tett+p7+p 3+.) 


nEenN(N) pKN 


=]] @-r')y". 


PSN 


I 


Ifn < N, then certainly n € Nt(.N), so 


It follows by Equation (1.2) that 


logN < \> 7 = Il @-==°)": (1.5) 


nen(N) PSN 


In order to estimate the right-hand side of Equation (1.5), we need the 
following bound. For any v € [0,1/2], 


<r", (1.6) 


To see why the bound (1.6) holds, let f(v) = (1 — v) exp(v + v?). Then 
f'(v) = v(1 — 2v) exp(v + v’) > 0 for v € [0, 5], 


so the fact that f(0) = 1 implies that f(v) > 1 for all v € [0,1/2]. 
For any prime p, v = ; < 5, so by the bound (1.6) 
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I] @-2°)° < [[ ew @t +p). 


pgN DSN 


Combining this with Equation (1.5) and taking logarithms gives 


loglog N < ss (p-' +p’). (1.7) 
PSN 


Finally, we observe that 
1 wil 
oe ae (1.8) 
p n=2 


so the contribution to the right-hand side of Equation (1.7) from >7,<y p 7 is 
bounded independently of NV. This completes the second proof of Theorem 1.3. 


Exercise 1.3. Prove the second inequality in Equation (1.8) using the integral 
test: Show that 


Ny N 4 
Deas Gap for all N > 2. 


In fact, an estimate stronger than Equation (1.4) holds. Mertens showed 
that there is a constant A (approximately 0.261) such that 


1 1 

—~=loglogN+A —— |}. 1. 
S- oglog N + +0(oy) (1.9) 
PEN 
Exercise 1.4. Is it possible to prove Equation (1.9) with O(1) in place of 


1 
log N 


A+0O( ) 


using only the methods of the second proof of Theorem 1.3? 


The third proof of Theorem 1.3 extends the relationship between prod- 
ucts such as J], cp (1 — py)” and the harmonic series to a factorization of a 
function that will later turn out to have a starring role. 


Definition 1.4. The Riemann zeta function is defined by 
a 
((o) = ys ne: 
n=1 


wherever this makes sense. 
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Figure 1.2. The graph of ¢(o) for 1 < o < 20. 


Understanding the properties of this function turns out to be the key to 
many deeper properties of the prime numbers. For now, we simply think of 7 
as being a real number and note that the series defining ¢(0) converges by the 
integral test for o > 1 to a positive sum and diverges at o = 1. For a > 1, ¢(c) 
is a decreasing function of o. 

Viewed as a real function of a real variable, the zeta function does not look 
particularly subtle or useful. Figure 1.2 shows the graph of ¢(c) for 1 < o < 20. 
Some indication of just how complicated this function really is appears when 
it is viewed as a complex-valued function of a complex variable. It is clear 
that the series defining the zeta function converges for s = ¢+it when 0 > 1 
(see p. 166 for more on this). Figure 1.3 shows the function R(¢(2 + it)) 
for 0 < t < 60, giving the first insight into the complex properties of the zeta 
function. 

In Chapter 8, the Riemann zeta function is extended to a complex analytic 
function defined on the whole complex plane with the exception of a single 
pole, and this opens up the most mysterious aspect of the zeta function — its 
behavior along the line R(s) = 4. Figure 9.1 on p. 186 gives some idea of how 
complicated this is. 

Recall that p will be used to denote a prime number, so a product over 
the variable p means a product over p € P. 

The first step in understanding the zeta function is the Euler product 
representation, which is a factorization of the zeta function into terms corre- 
sponding to primes. The idea of factorizing a function will be discussed again 
at the start of Chapter 9. 


Theorem 1.5. [EULER PRODUCT REPRESENTATION] For any o > 1, 


(0) =] Q-p7y- 


Pp 
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0 10 20 30 40 50 60 


Figure 1.3. The graph of R(¢(2 + it)) for 0 < t < 60. 


PROOF. For any a > 1, 


(1-2")o)= = - ae 


=1+ 0 a 


p|n=>p>2 


where the last sum is taken over those n with all prime factors greater than 2 
(that is, the odd numbers greater than 2). 

Now let P be a large prime and repeat the same argument with each of 
the primes 3,5,...,P in turn. This gives 


(1-27?) (1-377) (1-5°’)---(1-P-")¢@) =1+ SY a 


nr 
p|n=>p>P 


The last sum ranges over those n with the property that all the prime factors 
of n are greater than P. Thus the last sum is a subsum of the tail of the 
convergent series defining ¢(o), and in particular it must tend to zero as P 
goes to infinity. It follows that 
jim (1-277) (1-377) (1-577)---(1-P°") (0) =1, 
00 


so 


Remark 1.6. An infinite product is defined to be convergent if the correspond- 
ing partial products form a convergent sequence, that does not converge to 
zero. The nonzero condition is imposed to allow us to take logarithms of in- 
finite products, thereby connecting infinite products and infinite sums in a 
meaningful way. 
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THIRD PROOF OF THEOREM 1.3. Taking logarithms of the Euler product 
representation shows that, for any o > 1, 


log ¢(a) = — se log (1 - pg) 


--y ye -rs+ry 


p m=1 p p m=2 


1 
. 1.10 
ape 0 


Notice that the series involved converge absolutely, so rearrangement is per- 
missible. For any prime p, 


so 


which shows that the last double sum in Equation (1.10) is bounded. The 
bound 2¢(2) holds for any ¢ > 1, and the double sum converges for ¢ > 5. 
Thus 1 
log ¢(o) = S> — + O(1). 
- Pp 
The left-hand side goes to infinity as o tends to 1 from above, so the sum on 
the right-hand side must do the same. 


1.3 Listing the Primes 


Early in the history of the subject, Eratosthenes! devised a kind of sieve for 
listing the primes. To illustrate his method — the sieve of Eratosthenes — we 
consider the problem of finding all the primes up to 50. First arrange all the 
integers between 1 and 50 in a grid. 


' Eratosthenes of Cyrene (276 B.c.-194 B.C.) was born in what is now Libya. He 
made major contributions to many subjects, including finding surprisingly ac- 
curate estimates for the circumference of the Earth and the distances from the 
Earth to the Sun and the Moon. 
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123 45 67 8 9 10 
11 12 13 14 15 16 17 18 19 20 
21 22 23 24 25 26 27 28 29 30 
31 32 33 34 35 36 37 38 39 40 
41 42 43 44 45 46 47 48 49 50 


Now do the sieving: Eliminate 1, then start with 2 and cross out all num- 
bers greater than 2 and divisible by 2. Then take the next surviving number 3 
and cross out all the multiples of 3 that are greater than 3. Repeat with 
the next surviving number and continue until the numbers divisible by 7 are 
crossed out. 


Exercise 1.5. Why can you stop sieving once you get to 7? 


The remaining numbers are the prime numbers below 50, as shown below. 


230507 
11013 17019 
23 29 

31 37 

41 L) 43 47 


Understanding the patterns of the surviving numbers remains one of the great 
challenges facing mathematics two thousand years after Eratosthenes. 

This method has great value, allowing people throughout history to rapidly 
create lists of primes. It fails to meet our longer-term objectives however. It 
elegantly and efficiently produces lists of primes without having to do trial 
divisions but does not help to decide if a given large number (with hundreds 
of digits, for example) is prime. 


Table 1.1. Early prime hunters. 


Name Date Bound 
Pietro Cataldi 1588 750 
T. Brancker 1688 100000 
Felkel Kulik 1876 | 100330200 
Derrick Henry Lehmer| 1909] 10006721 


Table 1.1 is a short list of some of the calculations of prime tables in 
recent history; in each case all the primes up to the bound were listed. A 
rather different problem is to find exactly how many primes there are below 
a certain bound (without finding them all). Kulik listed the smallest factors 
of all the integers up to his bound and in particular found all the primes up 
to his bound. Lehmer’s table was widely distributed and as a result was very 
influential (despite being shorter than Kulik’s table). 
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1.3.1 Functions that Generate Primes. 


In the seventeenth century attention turned to finding formulas that would 
generate the primes. Euler pointed out the following polynomial example. 


Example 1.7. The polynomial x? + x +41 yields prime values for 0 < x < 39, 
but x = 40,41 do not yield primes. 


What is striking about this example is that it is prime for many values in 
succession relative to the size of the coefficients and the degree. 


Exercise 1.6. (a) [GOLDBACH 1752] Prove that if f € Z[z] has the property 
that f(n) is prime for all n > 1, then f must be a constant. 

(b) Extend your argument to show that if f € Z[a] has the property that f(n) 
is prime for all n > N for some N, then f must be a constant. 

(c) Let P € Z[x,,...,x2,%] be a polynomial in k > 2 variables with integer 
coefficients. Define a function f by f(n) = P(n,2”,3”",...,(k — 1)”), and 
assume that f(n) > oo as n > oo. Show that f(n) is composite for infinitely 
many values of n. 


Remarkably, there is an explicit integral polynomial in several variables 
whose set of positive values as the variables run through the nonnegative 
integers coincides with the primes. This polynomial was discovered as a by- 
product of research into Hilbert’s 10th Problem, which asked if there could 
be an algorithm to determine if a polynomial Diophantine? problem has a 
solution. However, once again, this is useless with regard to the aim of finding 
ways to generate primes efficiently. 

There are ingenious “formulas” for the primes. Many of these require 
knowledge of the first (n — 1) primes to produce the nth prime, and none 
of them seem to be computationally useful. We will prove one striking result 
of this kind here, and two further results in Exercise 1.24 on p. 33 and in 
Exercise 8.9 on p. 163. The result proved here rests on Bertrand’s Postulate, 
which is the first of many results that say something about how the prime 
numbers appear and how the next prime compares in size with the previous 
prime. The arguments below are intricate but elementary, and the basic con- 
tradiction arrived at in the proof of Theorem 1.9 is similar to one that will be 
used to prove Zsigmondy’s Theorem (Theorem 1.15) in Section 8.3.1. 

We need a lemma that says something about the growth in the product of 
all the primes up to n. As usual p will be used to denote a prime. 


Lemma 1.8. For any n > 1, 


S¢ log p < 2nlog2. (1.11) 


pxn 


? Diophantine problems are discussed in Chapter 2. The term is used to denote 
problems involving equations in which only integer solutions are sought. 
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PROOF. Let 


vee Co) (2m + 1)(2m)--+ (m+ 2) 


m ml! 


This is a binomial coefficient, so it is an integer (see Exercise 1.10 for a stronger 
form of this). The coefficient M appears twice in the binomial expansion 
of 22741 = (14-1)?"™*1, so M < 2?" If m+1 <p < 2m+1 for some prime p, 
then p divides the numerator of M but does not divide the denominator, so 


[| » divides, 
pEA(m) 


where A(m) denotes the set of primes p with m+1<p<2m-+1. It follows 
that 


oy log p — S- logp = s- logp < log M < 2mlog2. (1.12) 
p<2m+1 pgm+1 pEA(m) 


We now prove Equation (1.11) by induction. It holds for n < 2, so suppose it 
holds for all n < k —1. If k is even, then 


YS logp= S~ logp < Ak —1)log2 < 2klog2 
p&k p<k-1 
by the inductive hypothesis. If k is odd, write k = 2m+1 and then 


S- logp = S- log p — Ne, log p+ S- log p 


pK2m+1 pK2m4+1 pem+l1 pxm+l1 
< 2mlog 2+ 2(m + 1) log 2 
= 2(2m + 1) log2 = 2k log 2, 


since m+ 1 < k. Thus the inequality (1.11) holds for all n by induction. 


Theorem 1.9. [BERTRAND’S POSTULATE] Ifn > 1, then there is at least one 
prime p with the property that 


n<p<2n. (1.13) 


ProoF. For any real number 2, let |a| denote the integer part of «. Thus || 
is the greatest integer less than or equal to x. Let p be any prime. Then 


a-lsblal~ 


is the largest power of p dividing n! (see Exercise 8.7(a) on p. 162). Fix n > 1 
and let 
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N= II pk) 
pan 


be the prime decomposition of N = (2n)!/(n!)?. The number of times that 
a given prime p divides N is the difference between the number of times it 
divides (2n)! and (n!)?, so 


wn ((B-Ag) 


and each of the bored in the sum is either 0 or 1, depending on whether Loe | 
is odd or even. If p™ > 2n the term is certainly 0, so 


K(p) < em" | (1.15) 


log p 


Now the proof of the theorem proceeds by a contradiction argument. As- 
sume there is some n > 1 for which there is no prime satisfying the inequal- 
ity (1.13), and let p be a prime factor of N = (2n)!/(n!)?. Thus p <n by our 
assumption, and k(p) > 1. If 


2 
Fi 


then ri 
2p < 2n < 3p and p? > 57 > 2n, 


so Equation (1.14) becomes 


anf] alg]-2-2-8 


We deduce that p < zn for every prime factor p of N. It follows that 


Sc logp < S- logp < snlog2 (1.16) 


p|N pK2n/3 
by Lemma 1.8. Now if k(p) > 2 then by the bound (1.15), 
2logp < k(p) log p < log 2n, 


so p < V2n and thus there are at most /2n possible values of p. Hence 


S> k(p) log p < V2n log 2n. 


k(p) 22 


Together with the inequality (1.16), this shows that 
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logN < So logp + S> k(p)logp 
k(p)=1 k(pye2 


< S “log p + V2n log 2n 


p|N 


4 
< 3 les 2 + V2n log 2n. (1.17) 


Now N is the largest coefficient (namely the middle one) in the binomial 
expansion of 
gan = (1 4 12", 


2 2 2 
gn mae ( 4 (laa (0) < ann, 


Substituting this estimate into the inequality (1.17) gives 


so 


4 
2nlog2 < gi log2 + log 2n + v2n log 2n. (1.18) 


It is clear that the inequality (1.18) cannot hold for large values of n; a simple 
calculation shows that (1.18) implies that n does not exceed 500. 

It follows that if n > 500, then there is a prime satisfying the inequal- 
ity (1.13). A calculation confirms that (1.13) also holds for all n < 500, com- 
pleting the proof of the theorem. 


Notice that a consequence of Equation (1.13) is that if the primes are listed 
in order as pj, p2,..., then 


Pnti <2pn for all n> 1. (1.19) 


It is clear that Theorem 1.9 gives another proof that there must be in- 
finitely many primes. In each interval of the form (n,2n] there is at least one. 
This gives us a bound for the prime counting function 


m(X) = |{p < X | pe P}. 


The proof of Euclid’s Theorem 1.2 already says a little more than the purely 
qualitative statement that 7(X) — oo as X — oo: from the proof of Theo- 
rem 1.2 we see that 

Pn+i < Pipo-+*Pn +1. 
This tells us something about 7(X). Define a sequence (u,,) by setting u1 = 2 
and Un41 = U1-+*Un +1 for n > 1. Then 


m(X) >min{n | up, > X}. 


This is an extremely slowly growing sequence, and the bound obtained 
for 7(X) is very far from the truth. 
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Theorem 1.9 says more: there are at least N primes in the interval 
(1, 27] = (1,2) U (2,4) U (4, 8] UU (2977, 2%], 


so 7(2%) > N. It follows that 7(X) is larger than Clog(X) for some pos- 
itive constant C’, infinitely often. Something closer to the truth about the 
asymptotic behavior of 7(X) is the Prime Number Theorem (Theorem 8.1). 
Finding more refined estimates for 7(X) generally involves deep problems in 
analytic number theory. An exception is the result of Tchebychef, described in 
Exercise 8.7 on p. 162, which uses elementary methods to give better bounds 
for 7(X). 

Bertrand’s Postulate is enough to exhibit a striking but impractical for- 
mula for the primes. More importantly, the bound (1.13) immediately moti- 
vates the question of whether the upper estimate 2n could be reduced, perhaps 
for all large n only, and this is the subject of ongoing research. 


Corollary 1.10. There exists a real number @ with the property that 


a 


is a prime number for any number of iterations of the exponential. 


ProorF. Let qi be any prime, and choose a sequence of primes (q,,) with the 
property that 
DEO pest DUET, (1.20) 


This is possible by Bertrand’s Postulate. Now define functions f, f®,... 
by f(a) = logs(x) and f("t) (x2) = logy(f™(x)) for n > 1. Define se- 
quences (u,) and (v,) by 


un = f'™ (an) and vn = f™ (qn +1). 
By the inequality (1.20), 
dn < FO (ner) < FO (dne1 +1) < dn +1, 
so by applying the increasing function f”) we have 
Un <Un4ti < Unti < Un- 


It follows that the sequence (uy) is increasing and bounded above, so it con- 
verges. Let 
6= lim un. 
n—->co 
Define functions g™ by g(x) = 2° and gt) (x) = 29°) for all n > 1. 
Then 
9 (un) < g™(8) < 9 (un), 
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SO 
an < g(0) <n +1 for all n>1 


as required. 


Exercise 1.7. [MILLS] A deep result of Ingham improves Equation (1.13) to 

say that there is a constant C’ such that 
Pn+1— Pn < Opa”. 

Assuming this result, modify the proof of Corollary 1.10 to show that there 

is a real number 6 with the property that |@°" | is a prime for all n > 1. 


Exercise 1.8. [RICHERT] Use Theorem 1.9 to show that every integer greater 
than 6 is a sum of distinct primes. (Hint: Show this is true for the numbers 7 
to 19, then use Theorem 1.9 to see that we can keep adding new primes to 
the set of sums obtained without missing out any integers). 


Exercise 1.9. [DRESSLER] (a) Modify the proof of Theorem 1.9 to show that 
Pnti < 2p, —10 for all n> 6. 


(Hint: Assume there is an integer n > 1000 for which no prime p has the 
property n < p< 2n — 10, and consider the primes dividing N = Coe) .) 
(b)*Use your result to prove that every positive integer apart from 1, 2, 4, 6 


and 9 can be written as a sum of distinct odd primes. 


1.3.2 Mersenne Primes 


Mersenne? noticed that 2? — 1 = 3, 223 -1=7, 25-1 =31, and 27—1= 127 
are all primes. He suggested on the basis of experiments that 2? — 1 would be 
a prime whenever p is a prime that exceeds by 3 or less an even power of 2. 


Lemma 1.11. Jf 2” —1 is prime, then n is prime. 


PROOF. We prove the contrapositive statement that n being composite 
forces 2” — 1 to be composite. If n = ab with a,b > 1, then 


2” —1= (2% —1)(2"-* gn—2a fees tpl 9a 1), 


so 2” — 1 is composite. 


The list of primes noticed by Mersenne does not continue uninterrupted 
because 2'!—1 is composite. A prime of the form 2?—1 is known as a Mersenne 


3 Marin Mersenne (1588-1648) was a French friar in the religious order of the 
Minims. He defended Descartes and Galileo against their theological critics and 
worked to undermine alchemy and astrology. He wrote on music as part of his 
studies in physics and mathematics. 
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prime. The next few Mersenne primes are 2!° — 1, 217 — 1 and 2! — 1. It is 
not known if there are infinitely many Mersenne primes. That 21° —1 is prime 
was known to Cataldi in 1588, and this was the largest known prime for 150 
years. Fermat discovered that 273 —1 is not prime in 1640; in 1732 Euler knew 
that 229 — 1 is not prime but that 2°! — 1 is prime. 

It is worth pausing to say something about how this knowledge, which 
potentially requires the factorization of ten-digit numbers, accrued. Generally 
this involved a mixture of improving technique with congruences, some guile, 
and some heroic calculations. The first of several theoretical advances was 
discovered by Fermat and is now known as Fermat’s Little Theorem. 


Theorem 1.12. [FERMAT’S LITTLE THEOREM] For any prime p and any 
integer a, 
a? =a (mod p). 


In keeping with our philosophy about differing approaches, we present two 
proofs of Fermat’s Little Theorem. 
COMBINATORIAL PROOF. It is enough to prove the statement when a is a 
positive integer, so we use induction. The result is true for a = 1 because 
both sides are 1. Assume it is true for a = b. Now 


Pp 
(+ 1P =P +p phe 1= So (FYB 
7=0 


by the Binomial Theorem. For 0 < j < p, @) = neo has a numerator 
divisible by p and denominator not divisible by p; the Fundamental Theorem 


of Arithmetic then shows that @) is divisible by p for 7 = 1,...,p—1. So 


(b+1)? =v? +1=641 (mod p) 


by the inductive hypothesis. Thus Fermat’s Little Theorem is proved. 


Exercise 1.10. Prove that the product of any n successive integers is divisible 
by nl. 


A second, and often more useful, version of Fermat’s Little Theorem can 
be written as follows. Integers a and 0 are said to be coprime if gcd(a, b) = 1. 
For all a € Z that are coprime to p, 


a?-'=1 (mod p). (1.21) 
This form is easily seen to be equivalent to Theorem 1.12 as follows: 
a? —a=a(a?-' —1), 


so when p does not divide a the Fundamental Theorem of Arithmetic shows 
that p|(aP-" — 1) if and only if p|(a? —a). 
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The second proof of Fermat’s Little Theorem proves the congruence (1.21) 
and uses slightly more sophisticated ideas from group theory. The virtue of 
this second proof is that it is quicker and (as we shall see) is better suited 
to generalization. It does require some properties of modular arithmetic (see 
Exercise 1.28 on p. 38). 

PRooF Usinc Group THEORY. Work in the group G = (Z/pZ)* of nonzero 
residues modulo p under multiplication. The residue of a generates a cyclic 
subgroup of G whose order must divide that of G by Lagrange’s Theorem. 
Since the order of G is (p — 1), we deduce Equation (1.21). 


This proof is something of an anachronism: Lagrange’s Theorem gener- 
alized Fermat’s Little Theorem. However, thinking of residues using group 
theory is a powerful tool and gives rise to many more results, so it is useful to 
begin thinking in those terms now. Exercise 3.6 on p. 62 gives a good example 
where a proof using group theory can be favourably compared with a proof 
that only uses congruences. 


Exercise 1.11. Fermat’s Little Theorem says that, for any prime p, 2?-!—1 
is divisible by p. It sometimes happens that 2?—!—1 is divisible by p?. Find all 
the primes p with this property for p < 10°. Such primes are called Wieferich 
primes, and it is not known if there are infinitely many of them. 


Exercise 1.12. *A pair of congruences that arises in the Catalan problem 
(see p. 57) for odd primes p, q is 


p’'=1 (mod q?) and q?-'=1 (mod p’). (1.22) 


A pair of odd primes satisfying Equation (1.22) is called a Wieferich pair. 
Find all the Wieferich pairs with p,q < 10+. 


Exercise 1.13. An integer n is called a perfect number if it is equal to the 
sum of its proper divisors. Thus 6 = 1+ 2+4 3 is a perfect number. 

(a) If g= 2? — 1 is a Mersenne prime, prove that 2?~‘g is a perfect number. 
(b) Prove that if n is an even perfect number, then n has the form 2?~1(2? —1) 
for some prime of the form 2? — 1. 


It is not known if there are any odd perfect numbers, but there are certainly 
no odd perfect numbers smaller than 10*°°. 

Write M,, = 2” —1 for the nth Mersenne number. The Mersenne numbers 
have special properties that make them particularly suitable for primality 
testing. The next result is the first of a series of results showing that divisors 
of M,, are quite prescribed when n is prime. 


Lemma 1.13. Suppose p is a prime and q is a nontrivial prime divisor of Mp. 
Then q =1 modulo p. 
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Again, we give two proofs. 
PROOF USING THE EUCLIDEAN ALGORITHM. The condition that q di- 


vides MM, amounts to 
2?=1 (mod q). 


By Fermat’s Little Theorem, 27~' = 1 modulo q. Let d = gced(p,q — 1). 
If d = p, then p| (q—1) as required. The only other possibility is d = 1 since p 
is prime. By Theorem 1.23 (see p. 35), in this case there are integers a and b 
with 1 = pa+ (q—1)b. Notice that one of a and b must be negative. Now 


Dao opr ita) Saat be lf tmod @)> (1.23) 


which is impossible as gq > 1, so the result is proved. 


In the preceding argument, we have made use of negative exponents of 
expressions modulo q, but only in the form 


1-*=1 (mod g) fora>0. (1.24) 


PROOF USING GROUP THEORY. Work in the group G of nonzero residues 
modulo q. In this group 2 generates a cyclic subgroup whose order divides p 
since 2? — 1 = 0 modulo gq. Since 2 is not the identity and p is prime, the 
order of 2 must be p. Again, by Lagrange’s Theorem, this order must divide 
the order of the group G, which is (q— 1). 


Example 1.14. Lemma 1.13 is a significant help in factorizing M,. To see how 
this works, we present Fermat’s proof from 1640 that 273 — 1 is not prime. If ¢ 
is a prime dividing 27° — 1, then g = 1 modulo 23. Now 23n +1 is a prime 
smaller than 22% — 1 only for 


n = 2,12, 20, 26, 30, 36, 42, 44, 50, 56, 60, 62, 72, 84, 86, 102, 104, 110. 


Trial division shows that M3 is divisible by the first of the resulting num- 
bers, 47. In general, there is no reason to expect the smallest possible candidate 
to be a divisor, but even if the largest were the first such divisor, only 18 trial 
divisions are involved. 


In 1876, Lucas discovered a test for proving the primality of Mersenne 
numbers. Using this test, he proved that 


2127 _ 1 = 170141183460469231731687303715884105727 


is prime, but 2°” — 1 is not. This disproved the suggestion of Mersenne. 

The latter number occupies a special place in the history (and folklore) of 
mathematics. First, Lucas showed it is not prime but was not able to exhibit a 
nontrivial factor, which might seem a remarkable idea. In fact, it is something 
we will encounter again in the computational number theory sections. Second, 
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this number was the subject of a famous talk given by Prof. F. N. Cole to 
the American Mathematical Society in 1903 entitled “On the Factorization 
of Large Numbers.” On one blackboard, he wrote out the decimal expansion 
of 267 — 1 and on another he proceeded to compute the product of 193707721 
and 761838257287, thereby showing them to be equal. The legend goes that 
after this silent lecture he sat down to “prolonged applause.” 

The specific arithmetic properties of Mersenne numbers mean that results 
on the primality of later terms in the sequence sometimes predated results on 
earlier terms. For example, 2!2”—1 was shown to be prime in 1876 while 2°9—1 
and 2197 — 1 were shown to be prime in 1914. 


Exercise 1.14. *[LucAS-LEHMER TEST] Define an integer sequence by 
S,;=4 and Spar= S22 for n>2. 


Let p be an odd prime. Prove that M, = 2? — 1 is a prime if and only 
if S,-1 =0 modulo M,. 


1.3.3 Zsigmondy’s Theorem 


Although the proof of the conjecture that there are infinitely many Mersenne 
primes seems a long way off, it is known that the sequence starts to produce 
new prime factors very quickly. A prime p is a primitive divisor of M,, if p 
divides M,, but does not divide M,,, for any m < n. Table 1.2 shows the prime 
factorization of M,, for 2 <n < 24, with primitive divisors shown in bold. 

The pattern that seems to emerge from Table 1.2 turns out to reflect 
something genuine. Sequences such as the Mersenne sequence, after a few 
initial terms, always have primitive divisors. 


Theorem 1.15. [ZstiGMonbDy] Let M,, = 2”—1. Then for everyn #6,n> 1, 
the term M,, has a primitive divisor. 


As seen in Table 1.2, Mg does not have a primitive divisor, so this result 
is optimal. The proof of Theorem 1.15 is presented in Section 8.3.1 on p. 167, 
after we have proved the Mobius inversion formula (Theorem 8.15). A basic 
result that will be needed for the proof can be proved now, using the Binomial 
Theorem. Notice that this result, proved as the next exercise, already shows 
that the divisors of the sequence (M,,) have a special structure. 


Exercise 1.15. Let p denote a prime, and for any integer N, define ord,(N) 
to be the exact power of p that divides N. Thus ord,(N) = a means p*|N 
but petty] N. 

(a) Prove that ord, behaves like a logarithm in the sense that 


ord, (ay) = ord,(a) + ord,(y) 


for all integers x, y. 
(b) Prove that if p|Mn then ord,(Mzn) = ordp(M,,) + ordp(k). 
(c) Deduce that ged(Mn,Mm) = Mgca(n,m) for all m,n. 
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Table 1.2. Primitive divisors of (Mn). 


n Mn Factorization 
2 3 3 

3 7 7 

4 15 3-5 

5 31 31 

6 63 3°-7 

7 127 127 

8 255 3-5-17 
9 511 7-73 
10 1023 3-11-31 
11 2047 23-89 
12 4095 3-5-7-13 
13 8191 8191 
14 16383 3-43 -127 
15 32767 7-31-151 


16 65535 3:°5-17-257 


17} 131071 131071 
18] 262143) 3°-7-19-73 
19} 524287 524287 


20| 1048575} 3-57-11-31-41 
21| 2097151 7-127-337 
22| 4194303} 3-23-89-683 
23| 8388607 47 - 178481 
24|16777215|3-5-7-13-17-241 


Exercise 1.16. (a) Show that if ¢ is a prime then every prime divisor of M, 
is a primitive divisor. 
(b) If M,, does not have a primitive divisor show that M,, divides the quantity 


n{[ Mn/p- 
P| 
p<n 


(c) Deduce that for n > 6, every term M,, has a primitive divisor if n has only 
two distinct prime divisors. (Hint: take logarithms of the quantities in (b) and 
compare the growth rates of both sides.) 

(d) What can you deduce if n has three distinct prime divisors? 


Zsigmondy’s Theorem holds in greater generality, though we will not prove 
the following result here. 


Theorem 1.16. [ZSIGMONDy] Let a, = c” — d”, where c > d are positive 
coprime integers. Then ay, always has a primitive divisor unless 


(1) c=2,d=1 and n= 6; or 
(2)c+d=2* andn=2. 
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Exercise 1.17. Find some nontrivial examples of case (2) of the theorem. 
A more general result is considered in Exercise 8.19 on p. 169. 


Exercise 1.18. Prove that the sequence (uy) does not satisfy a Zsigmondy 
Theorem in each of the following cases. This means that for every N there is 
a term Un, n > N, which does not have a primitive divisor. 

(a) U, = an + 0 for integers a and b; 

(b) u, = n?+an-+ 6 for integers a and b with the property that the zeros 
of x? + ax + b are integers; 

(c)*un =n? + an +b for integers a and b. 


Exercise 1.19. *Can any polynomial uy = n? + agin?! +-+-+ ap for in- 
tegers ag,...,@a—1 have the property that the sequence (u,,) satisfies a Zsig- 
mondy Theorem? 


1.3.4 Mersenne Primes in the Computer Age 


The arrival of electronic computers extended the limits of large Mersenne 
prime-hunting dramatically. 

Table 1.3 is a short list showing how the size of the largest known Mersenne 
prime has grown over recent years; ##M/, denotes the number of decimal digits 
in M,. In 1978, Nickol and Noll were 18-year-old students. We do not distin- 
guish here between a Mersenne prime that is the largest known at the time 
from a Mersenne prime for which all smaller Mersenne primes are known; 
see the references for a more detailed discussion. In Table 1.3, (G) denotes 
GIMPS and (P) denotes PrimeNet; these are distributed computer searches 
using idle time on many thousands of computers all over the world. Because 
of the special properties of Mersenne numbers (and related numbers of special 
shape), it has usually been the case that the largest explicitly known prime 
number is a Mersenne prime. 


1.4 Fermat Numbers 


Fermat noticed that the expression F,, = 2?” + 1 takes prime values for the 
first few values of n: 


Fo=3, Fy=5, Fe=17, F3=257, and Fy = 65537. 


He believed the sequence might always take prime values. Euler in 1732 gave 
the first counterexample, when he showed that 641] F5. 

Euler, in common with Fermat and many others, was able to perform 
these impressive calculations through a good use of technique to minimize 
the amount of calculation required. Since Euler’s time, many other Fermat 
numbers have been investigated and shown to be composite. No prime values 


30 1 A Brief History of Prime 


Table 1.3. Largest known prime values of M, (from Caldwell’s Prime Pages [25]). 


p| #M,|Date|Discoverer 


17 6/1588 |Cataldi 
19 6/1588 |Cataldi 
31 10}1772 |Euler 

61 19}1883 |Pervushin 
89 27|1911|Powers 
107 33/1914 |Powers 
127 39/1876 |Lucas 


521 157}1952 |Robinson 
607 183]1952 |Robinson 
1279 386]1952 |Robinson 
2203 664]1952 |Robinson 
2281 687/|1952 |Robinson 
3217 969} 1957 |Riesel 
4253 1281/1961 |Hurwitz 
4423 1332]1961 |Hurwitz 
9689 2917|1963 | Gillies 
9941 2993) 1963 | Gillies 
11213 3376|1963 | Gillies 
19937 6002]1971 | Tuckerman 
21701 6533]1978 |Nickol and Noll 
23209 6987|1979 |Noll 
44497) 13395)/1979|Nelson and Slowinski 
86243} 25962/1982 |Slowinski 
110503) 33265/1988|Colquitt and Welsh 
132049) 39751)/1983 |Slowinski 
216091} 65050}1985 |Slowinski 
756839] 227832]1992 |Slowinski and Gage 
859433] 258716]1994 |Slowinski and Gage 
1257787] 378632/1996 |Slowinski and Gage 
1398269] 420921|1996 |Armengaud, Woltman et al. (G) 
2976221] 895932|1997|Spence, Woltman et al. (G) 
3021377} 909526/1998 |Clarkson, Woltman, Kurowski et al. (G, P) 
6972593|2098960/1999 |Hajratwala, Woltman, Kurowski et al. (G, P) 
13466917|4053946]2001 |Cameron, Woltman, Kurowski et al. (G, P) 
20996011/6320430|2003 |Shafer, Woltman, Kurowski et al. (G, P) 
24036583 ]7235733|2004 |Findley, Woltman, Kurowski et al. (G) 


of F,, with n > 4 have been discovered, and it is generally expected that only 
finitely many terms of the sequence (F;,) are prime. 

To begin, we return to Euler’s result that 641 divides F5. First, notice 
that 640 = 5-2” = —1 modulo 641 so working modulo 641, 


1S (1) = 29546278. 


Now 54 = 625 = —16 modulo 641 and 16 = 24. Hence 
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1 = —2°2 = —2?" (mod 641). 


Of course, this elegant argument is useful only once we suspect that 641 
is a factor of Fs. Euler also used some cunning to reach that point. 


Lemma 1.17. Suppose p is a prime with p|F,,. Then p = 2"+'k +1 for 
some k EN. 


Example 1.18. When n = 5, Lemma 1.17 shows that if p is a prime dividing F;, 
then p = 2k + 1 = 64k 4+ 1 for some k. Thus the list of possible divisors is 
greatly reduced. We only have to test Fs for divisibility by 


65, 129, 193, 257, 321, 385, 449, 513,577, 641,..., 


of which 65, 129,321,385,513,... are not primes. Therefore we only have 
to test 193,257,449, 577,641,... and so on. At the fifth attempt, we find 
that 641| Fs. 


PRooF OF LEMMA 1.17. Suppose p is a prime with p|F,, so 27> = —1 
modulo p and p is odd. Hence 


gntl 


2 = (2?")? =(-1)? =1 (mod p). 


Let d= gced(2"*!, p— 1), and write d= 2"t!a + (p—1)b for integers a and b 
using Theorem 1.23. Just as in Equation (1.23) one of a and 6 will be negative, 
so we again use Equation (1.24) to argue that 


24 = gat (e—1)b = (92"™ ya(gp-1)b=1 (mod p). 


Since Ci aie d= 2° for some 0 <c<n+1s0 
27> =24=1 (mod p). 
However, 2?” = —1 modulo p and —1 # 1 modulo p, so the smallest possibility 


for cis (n+1). Hence d = 2”*1. On the other hand, d|(p—1) so p—1 = k2"*? 
as claimed. 


Exercise 1.20. Strengthen Lemma 1.17 by showing that any prime p divid- 
ing F,, must have the form 2"t?k + 1 for some k € N. 


1.5 Primality Testing 


We have covered enough ground to take a first look at the challenges thrown 
up by primality testing. Given a small integer, one can determine if it is 
prime by testing for divisibility by known small primes. This method becomes 
totally unfeasible very quickly. We are really trying to factorize. The ability 
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to rapidly factorize large integers remains the Holy Grail of computational 
number theory. Later we will look at some more sophisticated techniques and 
estimate the range of integers for which they are applicable. 

For now, we concentrate on properties of primes that can be used to help 
determine primality. Fermat’s Little Theorem is an example, although it does 
not give a necessary and sufficient condition for primality, just a necessary one. 
The next result does give a necessary and sufficient condition; it is known as 
Wilson’s Theorem because of a remark to this effect allegedly made by John 
Wilson in 1770 to the mathematician Edward Waring. An early proof was 
published by Lagrange in 1772. The theorem first seems to have been noted 
by al-Haytham* some 750 years before Wilson. 


Theorem 1.19. An integer n > 1 is prime if and only if 


(n—1)!=-1 (mod n). 


PROOF OF ‘ONLY IF’ DIRECTION. We prove that the congruence is satisfied 
when n is prime and leave the converse as an exercise. Assume that n = p is 
an odd prime. (The congruence is clear for n = 2.) 

Each of the integers 1 < a < p—1 has a unique multiplicative inverse 
distinct from a modulo p (see Corollary 1.25). Uniqueness is obvious; for 
distinctness, note that a? = 1 modulo p implies p| (a+1)(a—1), forcing a= +1 
modulo p by primality. Thus in the product 


(p—1)!= @-1)(—-2)---3-2-1, 


all the terms cancel out modulo p except the first and the last. Their product 
is clearly —1 modulo p. 


Exercise 1.21. Prove the converse: If n > 1 and (n — 1)! = —1 modulo n, 
then n is prime. 


Exercise 1.22. [GAuss] Prove the following generalization of Theorem 1.19. 


Let 
Pos II m 


m<n, 
gced(m,n)=1 


be the product of all positive integers less than n and coprime to n. Then P,,+1 
is divisible by n if n is equal to 4, p*, or 2p” for some odd prime p, and P,, — 1 
is divisible by n if n is not of that form. 


4 Abu Ali al-Hasan ibn al-Haytham (964-1040) lived in Persia and Egypt. He is 
most famous for Alhazen’s Problem: Find the point on a spherical mirror where 
a light will be reflected to an observer. In number theory, in addition to proving 
what we often call Wilson’s Theorem, al-Haytham worked on perfect numbers 
(see Exercise 1.13). 
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Exercise 1.23. [CLEMENT] (a) Use al-Haytham’s Theorem (Theorem 1.19) 
to prove that, for n > 1, n and n+ 2 are both prime if and only if 


4((n—1)!+1)+n=0 (mod n(n+2)). 


(b) Prove that, for n > 13, the triple n, n + 2, and n+ 6 are all prime if and 
only if 


4320 (4 ((n—1)!41) + n) +361n(n +2) = 0 mod (n(n + 2)(n +6). 


(c) Find a similar characterization of prime triples of the form n, n + 4, 
and n+ 6. 


Primes p for which p+ 2 is also a prime are called twin primes, and it 
is a long-standing conjecture that there are infinitely many twin primes. A 
remarkable result of Brun from 1919 is that the reciprocals of the twin primes 
(whether there are infinitely many or not) are summable: 


S- Ee ice (1.25) 


p,p+2EP 
Numerical estimation of Brun’s constant B is very difficult. 


Exercise 1.24. Theorem 1.19 gives another ‘formula’ for the primes. Show 
that (n—2)! is congruent to 1 or 0 modulo n depending on whether n is prime 
or not, for n > 3. 

(a) Deduce that the prime counting function 7(X) = |{p € P| p < X}| may 
be written 


x 
es . | G- 2)! 
ry=1+> (U 2)! i} ; |). x23 
with m(1) = 0, 7(2) =1. 
(b) Define a function f by f(x,x) =0 and 


1 = 

f(a,y) == (1+ a for x #y. 
2 Iz — y| 

Use Theorem 1.9 to prove that 


gn 


Pn =1+ >> f(n,7(3)). 


j=1 


In principle, Theorem 1.19 seems to offer a general primality test because 
the condition is necessary and sufficient. The problem is that in practice it 
is impossible to compute (n — 1)! modulo n in a reasonable amount of time 
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for any integer that is not quite small. In Chapter 12 we will seek to give a 
better understanding of what counts as “small” or “large” in terms of modern 
computing. 

Fermat’s Little Theorem offers another hope. Taking a = 2, Fermat’s Little 
Theorem implies that 


2?-'=1 (mod p) whenever p is prime. (1.26) 


At various times in history, it has been thought that a kind of converse might 
be true: If n is odd and 2”—! = 1 modulo n, might it follow that n is prime? 
Calculations tend to support this, and for n < 341 this does indeed successfully 
detect primality. 


Example 1.20. Testing the congruence 2”~' = 1 modulo n fails to detect the 
fact that n = 341 = 11-31 is composite. By Fermat’s Little Theorem, 2!9 = 
modulo 11 so 234° = 154 = 1 modulo 11. Also 2° = 32 = 1 modulo 31, so 


2340 — (25)68 = 16 — 1 (mod 31). 


Thus 234° — 1 is divisible by the coprime numbers 11 and 31, and hence by 
their product 341, so 294° = 1 modulo 341. 


However, Fermat’s Little Theorem says more than Equation (1.26): It gives 
the congruence 
a?-'=1 (mod p) 


for any base a, not just a = 2. Taking a = 3 in Example 1.20, we quickly find 
3°40 = 56 (mod 341), 


which contradicts Fermat’s Little Theorem with a = 3, showing that 341 
cannot be prime. Notice the recurrence of a phenomenon encountered before: 
Using a = 3, we have shown that a number is not prime without exhibiting a 
nontrivial factor. 

This method suggests the following as a primality test. Given an integer n, 
choose numbers a at random with 1 < a < n and test to see if a”! = 1 
modulo n. If not, then n is definitely composite. If the congruence is satisfied 
for several such a, we might view this as compelling evidence that n must be 
prime. Unfortunately, this also fails as a primality test. 


Exercise 1.25. Prove that n = 561 is a composite number that satisfies Fer- 
mat’s Little Theorem for every possible base by showing that a°°° = 1 mod- 
ulo 561 for every a, 1 < a < n with gcd(a, 561) = 1. (Hint: Use Fermat’s Little 


Theorem on each of the factors 3, 11, and 17 of 561.) 


A composite integer that satisfies the congruence of Fermat’s Little Theo- 
rem for all bases coprime to itself is known as a Carmichael number; these will 
be discussed in more detail in Section 12.5. It was not known whether there 
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are infinitely many Carmichael numbers until 1994, when Alford, Granville, 
and Pomerance not only proved that there are infinitely many but gave some 
measure of how many there are asymptotically. The existence of infinitely 
many Carmichael numbers renders the test based on Fermat’s Little Theo- 
rem test too unreliable. Later, we will see however that a more sophisticated 
version is salvageable as a primality test. 


1.6 Proving the Fundamental Theorem of Arithmetic 


We uncover Euclid’s real genius once we try to prove the Fundamental Theo- 
rem of Arithmetic. There are two parts to it: existence and uniqueness. The ex- 
istence part is not difficult. Let n > 1 be an integer, and choose r with 2” > n. 
If n itself is not divisible by any a with 1 < a < n, then nothing else needs 
to be said. Otherwise, we can write n = ab with 1 < a,b < n. Again, if a 
and 6 cannot be factorized, further then we are done. If this is not the case 
then at least one of them can be factorized. Once we have done this r times, 
we have n = a,:--a, with each 1 < a; < n. This implies n > 2", giving a 
contradiction. Thus n must be a product of no more than r prime factors. 

It is when we come to the uniqueness part of the proof that we uncover a 
subtlety — namely, that the definition of prime as an irreducible element is not 
really adequate to prove the Fundamental Theorem of Arithmetic. Suppose 
we try to argue as follows: Consider two factorizations for n into primes, say 


Piss Pr = N= Ges. 


We would like to say that because p; divides the right-hand side, it must 
divide one of the q;. However, if we are working with the definition of prime 
as irreducible, then we need a result that tells us that being irreducible forces 
this divisibility property. Such a result may be found using the Euclidean 
Algorithm. 

Later, we will see examples in rings that are closely related to Z whose 
elements have genuinely different factorizations into irreducibles. 


Exercise 1.26. Let 
A={nEN|n=1 (mod 4)}, 


and call n 4 1 an A-prime if the only divisors of n in A are 1 and n. 

(a) Show that every element of A except 1 factorizes as a finite product of A- 
primes. 

(b) Show that this factorization into A-primes is not unique. 


1.6.1 The Euclidean Algorithm 


Given a,b > 0 in Z, we can always find q and r with a= bg+rand0<r<b. 
Indeed, for g we can simply take the integer part |a/b| of a/b and then show 
that by defining r = a — bq we must haveO<r<b. 
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Something very interesting happens when we iterate this process. It will 
help to define gq = q, and r = r; and continue to find quotients and remainders 
as follows: 

a=bqa +11, 0<r<b 
b=riqatre, O<rg<ry 


Tn-3 = Tn—-24n-1 + Tn-1; 0<Tn-1 <Tn-2 
Tn—-2 = Tn-19n T Tn) O< Tr <Tr-1 
Tn—-1 = TnQn41 1 0. 


The sequence of remainders is decreasing and each term is nonnegative, so the 
sequence must terminate. We have written r,, for the last nonzero remainder, 
SO Tn|Tn—1-. We claim that r, is the greatest common divisor of a and b. 


Example 1.21. Let a = 17 and b = 11. Then the Euclidean Algorithm gives 
the equations 


17= 11-146, 
11=6-14+5, 
6=5-141, 
5=1-5+0. 


The last nonzero remainder is the greatest common divisor of 17 and 11, which 
is clearly 1. 


To prove that r, = gcd(a, b), we need a better notion of greatest common 
divisor than the intuitive one. 


Definition 1.22. If a and b in Z are not both zero, d is said to be a greatest 
common divisor of a and b if 


(1) dla and d|b; and 
(2) if d’ is any number with d'|a and d’ 


b, then d’|d. 


The first condition says d is a common divisor of a and b, while the second 
says it is the greatest such divisor. 

Note that we say “a” greatest common divisor rather than “the” greatest 
common divisor because if d satisfies this condition then —d will also sat- 
isfy the definition. If we work in N, then the greatest common divisor will 
be unique. The notation gcd(a,b) denotes the unique nonnegative greatest 
common divisor of a and b. If gcd(a, b) = 1, then we will call a and b coprime. 


Exercise 1.27. Using Definition 1.22, show that r, = gcd(a, b). (Hint: Work 
your way up and then down the chain of equations to verify the two proper- 
ties.) 
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The next result is fundamental to the structure of the integers; it is an 
easy consequence of the Euclidean Algorithm and is sometimes referred to as 
Bezout’s Lemma. 


Theorem 1.23. If d = gcd(a,b) with a,b € Z not both zero, then there are 
numbers x,y © Z with 
d= ax + by. (1.27) 


PROOF. The idea is to work your way up the chain of equations in the Eu- 
clidean Algorithm, always expressing the remainder in terms of the previous 
two remainders. Writing * for an integer, we get 


gcd(a, b) =n = Tn-2 — Tn-19n 
= rr—a(1 + QnQn—1) —Tn-34dn 


=Tn-3°* +Tn—4* * 


=b-*+1r-* 
=a-*+bd-x. 


Example 1.24. Using the equations from Example 1.21 we find that 


1=6-5 
=6-(11-6) 
=2-6-11 
=2(17-11)-11 
=. 173+ 11. 


Corollary 1.25. Let n > 1 and a denote elements of Z. Then a and n are 
coprime if and only if there exists x with 


ax =1 (mod n). 
That is, gcd(a,n) = 1 if and only if a is invertible modulo n. 
PROOF. The congruence is equivalent to the existence of an integer y with 
ax+ny=1. 


If a and n have a factor in common then that factor will also divide 1, so the 
congruence implies a and n are coprime. Conversely, if a and n are coprime 
then 1 is a greatest common divisor of a and n so we can use Theorem 1.23 
to see that there are integers x and y with ax + ny = 1, which translates into 
the congruence. 
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Exercise 1.28. Let p be a prime. Prove that the set (Z/pZ)* of nonzero 
elements in Z/pZ forms a group under multiplication modulo p. 


One of the remarkable things about the Euclidean Algorithm is that it 
finds the greatest common divisor of two integers without factorizing either 
of them. We will see later how this has been exploited in powerful ways by 
computational number theory in recent years. 


Exercise 1.29. Prove the Fundamental Theorem of Arithmetic using Theo- 
rem 1.23. (Hint: This is done in greater generality on p. 47.) 


1.6.2 An Inductive Proof of Theorem 1.1 


We wish to prove that any natural number n has a decomposition n = p, --- pr 
into primes uniquely up to rearrangement of the prime factors. 

For n = 2, the theorem is clearly true. We proceed by induction. Suppose 
that the Fundamental Theorem of Arithmetic holds for all natural numbers 
strictly less than some a > 1. We want to deduce the Fundamental Theorem 
of Arithmetic for a. Let 

D={d|d> 1, dla} 


denote the set of non-identity divisors of a. The set D is nonempty since it 
contains a, so it has a smallest element, which we denote p. This smallest 
element must be a prime because if it had a nontrivial divisor that would be 
a smaller element of D. Thus we have a decomposition 


a= pb,p prime, b < a. 


Since b < a, by the inductive hypothesis, the Fundamental Theorem of Arith- 
metic holds for 6, so there is a prime decomposition 


b= piss 
into primes uniquely up to rearrangement. It follows that 
a=Pp°Pi°*'Ps 


is a prime decomposition of a, and a has no other prime decomposition in- 
volving the prime p. 
Suppose that a has another prime decomposition, 


a= """4r; 


in which the prime p does not appear. In particular, gq; 4 p. Moreover, by the 
definition of p,q > p since q € D, 1 < qm —p< qm. Let c= q@---q,, and 
define 

ag =a—pe=p(b—c) =(m — pie. (1.28) 


1.7 Euclid’s Theorem Revisited 39 


Now 1 < ap < a and the divisors (b — c),(q1 — p), and c are all less than a. 
By the inductive hypothesis, the numbers ag, (b—c), (qi — p), and c all have 
unique prime decompositions. By Equation (1.28), the prime p must appear 
in any prime decomposition of ag and therefore (by uniqueness) must also 
appear in the decomposition of (q, — p) or that of c. 

Now p cannot appear in a prime decomposition of (q; — p) because that 
would require pin, which is impossible, as p and q, are distinct primes. Nor 
can p appear in a prime decomposition of c = q2---q, by assumption. Thus 
the assumption of a second prime decomposition for a leads to a contradiction, 
completing the proof of the Fundamental Theorem of Arithmetic. 


1.7 Euclid’s Theorem Revisited 


In this section, three further proofs of Theorem 1.2 are given, each interesting 
and suggestive in its own right. 


1.7.1 What Did Euclid Really Prove? 


First, we return to the master’s proof. The following is a translation of Euclid’s 
proof taken from Joyce’s Web translation of Euclid’s Elements. In Euclid’s 
time, numbers were thought of as relatively concrete lengths of line segments. 
Thus, for example, a number A measures a number B if a stick of length A 
could be used to fit into a stick of length B a whole number of times. In 
modern terminology, A divides B. We start with Euclid’s Theorem in (an 
approximation of) Euclid’s language: 


Oi mpe&to1r apiBjioi TAEtous «toi Tavtdg tot 
tpotedév tos TANO0US TOdteY apIOUar. 


A translation of this is the following theorem, which is Proposition 20 of 
Book IX in Euclid’s Elements. 


Theorem 1.26. The prime numbers are more than any assigned multitude of 
prime numbers. 


ProoF. Let A, B, and C be the assigned prime numbers. I say that there are 
more prime numbers than A, B, and C. Take the least number DE measured 
by A, B, and C. Add the unit DF to DE. 

Then EF is either prime or not. 

First, let it be prime. Then the prime numbers A, B, C, and EF have 
been found, which are more than A, B, and C. 

Next, let EF not be prime. Therefore, it is measured by some prime num- 
ber. Let it be measured by the prime number G. I say that G is not the same 
as any of the numbers A, B, and C. 
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If possible, let it be so. 

Now A, B, and C measure DE, and therefore G also measures DE. But 
it also measures EF’. Therefore G, being a number, measures the remainder, 
the unit DF, which is absurd. 

Therefore G is not the same as any one of the numbers A, B, and C, 
and by hypothesis it is prime. Therefore, the prime numbers A, B, C, and G 
have been found, which are more than the assigned multitude of A, B, and C. 
Therefore, prime numbers are more than any assigned multitude of prime 
numbers. 


There is little between this argument and Euclid’s proof in modern form 
on p. 8. Euclid did not have our modern notion of infinity, so he proved that 
there are more primes than any prescribed number. He also often stated proofs 
using examples (in this case, what he really proves is that there are more than 
three primes), but it is clear he understood the general case. It is possible that 
part of the reason for this is the notational difficulties involved in dealing with 
arbitrarily large finite lists of objects. 


1.7.2 A Topological Proof of Theorem 1.2 


In 1955, Furstenberg gave a completely different type of proof of the infinitude 
of the primes using ideas from topology. 

FURSTENBERG’S TOPOLOGICAL PROOF OF THEOREM 1.2. Define a topology 
on the integers Z by taking as a basis the arithmetic progressions. For each 
prime p, let S, denote the arithmetic progression pZ. Since 


Sp = Z\((pZ +1) U---U(pZ + (p—1))), 


the set S, is the complement of an open set, and thus is closed. Let S = Oe Sp 
be the union of all the sets 5, as p varies over the primes. If there are only 
finitely many primes, then S is a finite union of closed sets, and thus is closed. 
However, every integer except +1 is in some Sp, so the complement of S 
is {1,—1}, which is clearly not open. It follows that S cannot be closed and 
therefore cannot be a finite union, so there must be infinitely many primes. 


In contrast with the other proofs of Theorem 1.2, this is qualitative — all 
it tells us about the prime counting function is that 7(X) > co as X > oo. 


1.7.3 Goldbach’s Proof 


Goldbach showed how one may use a sequence of integers with the property 
that an infinite subsequence are pairwise coprime to give a different proof. 


GOLDBACH’S PROOF OF THEOREM 1.2. We claim that the Fermat num- 
bers F;, = 22" + 1 are pairwise coprime:. 


mén => gcd(Fm, Fn) =1. (1.29) 
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The first step is to show by induction that 
Fy, — 2 = FoF, -:-Fm_1 for all m > 1. (1.30) 


To see why this is true, first note that F, — 2 = Fo and assume that Equa- 
tion (1.30) holds for m < k. Then 


FoF: +: Fy-1 FR = (Fr - 2) Fe 
= (2 -1) (2° +1) 
k+1 
= 1 = Fy — 2, 
showing Equation (1.30) by induction. Thus for m > n, 


d|Fmn;4|Fn => d|Fn—-2 => d 


2, 


which forces d to be 1 since all the F,, are odd numbers. This proves Equa- 
tion (1.29). 

This in turn means there must be infinitely many primes. By Theorem 1.1, 
each F,, has a prime factor p,, say, and by Equation (1.29) these are all 
distinct. 


The proof using Fermat numbers actually does a little more than prove 
there are infinitely many primes. It also gives some insight into how many 
primes there are that are smaller than a given number. By the time we reach 
the number F;,,, we must have seen at least n different primes, so 


1 log(X — 1) 
> 
Wale log 2 log ( log 2 , 


which is approximately proportional to loglog X. This is far weaker than the 
remark on p. 21. 


NOTES TO CHAPTER 1: The exact history of Theorem 1.1 is not clear, and it is 
likely that it was known and used long before it was explicitly stated. The earliest 
precise formulation and proof seems to be due to Gauss [67], but it could be argued 
that Euclid certainly knew that if a prime p divides a product ab, then p must di- 
vide a or 6, and that his geometrical formalism and approach to exposition did not 
require him to consider products of more than three terms (see Section 1.7.1). Many 
of the proofs of Euclid’s Theorem are featured in the Prime Pages Web site [25]; 
Ribenboim’s book [125] describes no fewer than 11 proofs. Example 1.7 is related to 
subtle problems in algebraic number theory; see Ribenboim’s book [125] for a dis- 
cussion and detailed references. That the positive values of a polynomial in several 
variables could coincide with the primes is essentially a by-product of Matijasevic’s 
solution to one of Hilbert’s famous problems. Some of the history and references 
and two explicit polynomials are given in accessible form in the paper [85] of Jones, 
Sato, Wada and Wiens. The proofs of Lemma 1.8 and Theorem 1.9 are those of 
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Erdos [51] and Kalmar, and may be found in Hardy and Wright [75]; that of Corol- 
lary 1.10 follows a survey paper of Dudley [46]. Bertrand’s Postulate (Theorem 1.9) 
was first proved by Tchebychef [151, Tome I, pp. 49-70, 63]. He also proved that 
for any e > = there is a prime between x and (1+ e)« for x sufficiently large. The 
deep result of Ingham [80] has been improved a great deal — for example, Baker, 


Harman and Pintz [8] have shown that there is a prime in the interval [a — «°°?° 


2 
for x sufficiently large. Exercise 1.7 is due to Mills [107]. Exercise 1.8 comes from a 
paper of Richert [127]; Exercise 1.9 from a paper of Dressler [45]. Further material 
on Mersenne primes — and on large primes in general — may be found on Caldwell’s 
Prime Pages Web site [25]; Table 1.3 is taken from his Web site. A recent account 
of the GIMPS record-breaking prime is in Ziegler’s short article [167]. Zsigmondy’s 
Theorems 1.15 and 1.16 appeared first in his paper [168]; a more accessible proof 
may be found in a short paper by Roitman [132]. Deep recent work has extended this 
to a larger class of sequences: Bilu, Hanrot and Voutier have shown that for n > 30 
the nth term of any Lucas or Lehmer sequence has a primitive divisor in their 
paper [15]. The current status of Fermat numbers and their factorization may be 
found on Keller’s Web site [88]. Parts of the intricate connection between group 
theory and the origins of modern number theory, and in particular a discussion of 
how Gauss used group-theoretic concepts long before they were formalized, are in a 
paper of Wufing [164]. For more on the very special numbers found in Exercise 1.11 
see Ribenboim’s popular article [123]. The inductive proof of Theorem 1.1 in Sec- 
tion 1.6.2 is taken from Hasse’s classic text [76] and is attributed there to Zermelo. 
Hasse’s text is also the source of the statement of Euclid’s Theorem in Greek in 
Section 1.7.1. We thank David Joyce for permission to use the translation in Sec- 
tion 1.7 from his Web site [86]; this Web site is based on several translations of 
Euclid’s work, but the primary and most accessible source remains the translation 
by Heath [53]. Exercise 1.24 is taken from Hardy and Wright [75]. Furstenberg’s 
proof of Euclid’s Theorem appeared in [63]. Exercise 1.23 is taken from Clement’s 
paper [31]. Brun’s result in Equation (1.25) appeared originally in his paper [24]; a 
modern proof may be found in the book of LeVeque [100]. Finally, we make some 
remarks concerning Section 1.7.2. Using topology in this setting might seem odd, 
but perhaps Euler’s proof using the harmonic series seemed odd when it first ap- 
peared. We don’t wish to stretch the point, but it could just be that Furstenburg’s 
proof points forward to new ways of looking at arithmetic in just the same way 
as Euler’s did. Profound structures in the integers have certainly been uncovered 
using methods from ergodic theory, combinatorics, functional analysis, and Fourier 
analysis; see a survey paper of Bergelson [11], the book by Furstenberg [64], and 
a new approach in a paper of Gowers [72] for some of these startling results. In a 
similar vein, Green and Tao [73] have recently proved the deep result that the primes 
contain arbitrarily long arithmetic progressions. 
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Diophantine Equations 


Diophantine equations are equations (very often involving polynomials with 
integer coefficients) in which the solutions are required to be integers. They 
have been studied since antiquity and are mathematically both challenging 
and attractive because of the great diversity of methods that are needed to 
understand them. 


2.1 Pythagoras 


In this chapter, we are going to explore the relationship between the Fun- 
damental Theorem of Arithmetic and the study of polynomial Diophantine 
problems. We begin with an equation handed down from antiquity, 


gy? = 27, (2.1) 


We know that an equation of this kind is related to a right-angled triangle with 
side lengths x,y, and z. Right-angled triangles have been studied and used 
for four thousand years (at least). Equation (2.1) is called the Pythagorean 
equation to honor Pythagoras for his result connecting Equation (2.1) to right- 
angled triangles. We seek to identify all the integral solutions; that is, to find 
all triples of integers (2, y,z) that satisfy Equation (2.1). The main point in 
the first three sections of this chapter is to emphasize the symbiosis between 
properties of numbers and solutions of equations. 
To motivate what follows, rearrange the equation to read 


w= 2? —y’ = (z+y)(z—y). (2.2) 
If we knew that gcd(z+ y, z — y) = 1, then we could apply the Fundamental 
Theorem of Arithmetic to argue that both (z+ y) and (z — y) must them- 
selves be squares and use the resulting equations to parametrize all triples of 
solutions. 
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To refine the proof, we resort to a congruence argument. First, we may 
assume that the triple (x,y, z) contains no common prime factor — otherwise 
we may divide through by the square of that factor. A triple (a, y, z) is called 
a primitive solution of Equation (2.1) if x,y, and z have no common factor. 
Second, we may assume that only one of the three is even because if two are 
then the third must be, contrary to the primitive condition. Now the even one 
out (so to speak) cannot be z because 


2? +y2=0 (mod 4) 


is impossible with x and y being odd. Thus we may suppose one of x or y is 
even. Without loss of generality, suppose it is x that is even. Write 1 = 22’ 
and substitute into Equation (2.2) to give 


) ztry\(zZ-y 
ee (a)C3") 

Notice that each of (z+y)/2 must be an integer because z and y are both odd. 
More than that, they must be coprime because any common factor of any two 
of x,y, and z must divide the third. Hence, any common divisor of (z + y)/2 
will also divide their sum and their difference, z and y, and we are assuming 
the triple (x,y, z) is primitive. 

Thus at last we may apply the Fundamental Theorem of Arithmetic to 
deduce that (z+ y)/2 are both squares, say 


z+y=2m?,z—y=2n?,m>n. (2.3) 


We are assuming z and y are positive so z+ y > z-—y, giving the in- 
equality between m and n. Solving Equation (2.3) for z and y and then us- 
ing Equation (2.1) to find x gives the following characterization of primitive 
Pythagorean triples. 


Theorem 2.1. The primitive integral solutions of the Pythagorean equation 


ety? = 2? 


with even x are given by 
x= 2mn, y=m —n?, z=m4+n? 
with m > n coprime integers, not both odd. 


The integers m > n are said to parametrize the solutions of the equation. 


Exercise 2.1. For any primitive solution of Equation (2.1) show that one 
of x,y, or z is divisible by 3, one by 4, and one by 5. 


2.2 The Fundamental Theorem of Arithmetic in Other Contexts 45 


Exercise 2.2. Finding integral solutions to Equation (2.1) is equivalent to 
finding rational solutions to 2? + y? = 1. Find the second point of intersection 
with the circle 2? + y? = 1 of the line with slope t through the point (1,0), 
and show that letting ¢ run through all rationals gives all rational solutions 
toa? +y?=1. 


Using geometry to construct new rational solutions of Diophantine equa- 
tions from old ones is a powerful idea that will be taken up again in Section 5.1. 


2.2 The Fundamental Theorem of Arithmetic in 
Other Contexts 


In the integers, the Fundamental Theorem of Arithmetic is a direct conse- 
quence of the existence of the Euclidean Algorithm. In certain rings, the 
two properties are not equivalent. For example, the Fundamental Theorem 
of Arithmetic holds in the ring of integer polynomials Z[x], even though this 
ring does not have a Euclidean Algorithm. Nonetheless, in many arithmetic 
contexts, the Fundamental Theorem of Arithmetic can be proven easily be- 
cause one has a Euclidean Algorithm. We will consider only commutative rings 
with a multiplicative identity, written 1. 


Definition 2.2. A commutative ring R is Euclidean if there is a function 
N: R\{0} 3 N 


with the following properties: 


(1) N(ab) = N(a)N(b) for all a,b € R, and 
(2) for all a,b € R, if b #0, then there exist q,r € R such that 


a=bq+r andr =0 or N(r) < N(b). 
Such a function is called a norm on R. 


Much of what follows can be done with weaker conditions. In particular, 
one does not need such a strong property as (1). However, in many cases, the 
norm does have this property, so we assume it to allow a speedier and more 
natural development of the argument. 


Example 2.3. The following are examples of Euclidean rings. 
(1) Let R = Z/i] denote the Gaussian integers, so 
R={atiy|xz,y € ZG, 


where i? = —1. Setting N(a + iy) = 27 + y? shows that R is a Euclidean 
ring. 
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(2) Let F denote any field and let R = F{[z] be the ring of polynomials with 
coefficients in F. Define N(f) = 24°), where deg(f) is the degree of f 
in F[a], which is defined for all nonzero elements of R. 


We prove the first of these; the second is an exercise. 


PROOF THAT Z/i] IS EUCLIDEAN. Condition (1) of Definition 2.2 is eas- 
ily verified by direct computation. For property (2), let a,b 4 0 € R and 
write ab-' = p+ig with p,q € Q. Now define m,n € Z by 


m € |[p—1/2,p+1/2), n€ [¢—1/2,¢+1/2). 


Let g=m-+ine€ Rand r=a-— b(m +in). For r £0, 


showing property (2). 


Exercise 2.3. When R = Z, for any fixed a and 0, the values of q and r in 
Definition 2.2(2) are uniquely determined. Is the same true when R = Z[i]? 


In any ring, we define greatest common divisors in exactly the same way 
as before. A greatest common divisor is defined up to multiplication by units 
(invertible elements). In any Euclidean ring, the function N can be used to 
define a Euclidean Algorithm, which can be used to find the greatest common 
divisor just as for the integers. 


Definition 2.4. In a ring R, 


(1) @ divides G3, written a|G, if there is an element y € R with B = ay; 
(2) u is a unit if u divides 1, 
(3) « (not equal to zero nor to a unit) is prime if for alla, € R, 


t|a3 => la or |B; 
(4) a non-unit ys is irreducible if 
L=aB = aor B is a unit. 


Notice that u € R is a unit if and only if there is some pz with uw = 1. We 
write U(R) or R* for the units in the commutative ring R; this is an Abelian 
group under multiplication. If the recent clutch of definitions are new to you, 
we recommend the following exercise. 


2.2 The Fundamental Theorem of Arithmetic in Other Contexts 47 


Exercise 2.4. (a) Show that, in any commutative ring, every prime element 
is irreducible. 

(b) Show that, in a Euclidean ring, u is a unit if and only if N(u) = 1. 

(c) Show that there are infinitely many units in Z[V3]. 

(d) Show that 3 + /—2 is an irreducible element of Z[,/—2]. 

(e) Let € = =lty"3 and R = Z[€]. Prove that R is a Euclidean domain with 
respect to the norm N(a + bé) = a? — ab+ b? = (a4 bé)(a + dE) and find all 
the units in R. 


Exercise 2.5. Prove the Remainder Theorem: For a polynomial f € F{[z], F 
a field, f(a) = 0 if and only if (# — a)| f(z). 


Exercise 2.6. Give a different proof of Lemma 1.17 on p. 31 using group the- 
ory by considering the multiplicative group of units U(Z/F,Z) = (Z/F,Z)*. 


Exercise 2.7. Prove that Z[z] does not have a Euclidean Algorithm by show- 
ing that the equation 2 f(x) + xg(x) = 1 has no solution for f,g € Z[a], but 2 
and x have no common divisor in Z[z]. 


Despite the conclusion of Exercise 2.7, the ring Z[x] does have unique 
factorization into irreducibles. 

We will say that a ring has the Fundamental Theorem of Arithmetic if 
either of the following properties hold. 


(FTA1) Every irreducible element is prime. 


(FTA2) Every nonzero non-unit can be factorized uniquely up to order and 
multiplication by units. 


Theorem 2.5. Every Euclidean ring has the Fundamental Theorem of Arith- 
metic. 


PROOF. Clearly, every irreducible 4 has N(u) > 2. Arguing as we did in Z 
shows we cannot keep factorizing into irreducibles forever, so the existence 
part is easy. To complete the argument, we just need to show that every 
irreducible is prime. This follows easily from Theorem 1.23. Let jy: be an irre- 
ducible and suppose that js divides af but ys does not divide a. Clearly, the 
greatest common divisor of yw and a is 1 because 4s admits only itself and units 
as divisors and uz does not divide a, so we can write 


pwxt+ay=1 
for some x,y € R by Theorem 1.23. Multiply through by ( to obtain 
bab + aby = B. 


Since yp divides both terms on the left-hand side, it must divide the right-hand 
side, and this completes the proof. 
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2.3 Sums of Squares 


The resolution of the Pythagorean equation ( Equation (2.1)) is an elemen- 
tary and well-known result. We are now going to show how the Fundamental 
Theorem of Arithmetic in other contexts can yield solutions to less tractable 
Diophantine equations. Consider the following problem: Which integers can 
be represented as the sum of two squares? That is, what are the solutions to 
the Diophantine problem 

n=? +47? 


When n is a prime, experimenting with a few small values suggests the fol- 
lowing. 


Theorem 2.6. The prime p can be written as the sum of two squares if and 
only if p= 2 or p is congruent to 1 modulo 4. 


To prove this, we are going to use the Fundamental Theorem of Arithmetic 
in the ring of Gaussian integers R = Z[i] with norm function N : R > N 
defined by N(a + iy) = x? + y? as in Example 2.3(1). 


Lemma 2.7. [fp is 2 or a prime congruent to 1 modulo 4, then the congruence 
T?+1=0 (mod p) 
is solvable in integers. 


PROOF. This is clear for p = 2 so suppose p = 4n+ 1 for some integer n > 0. 
Using al-Haytham’s Theorem (Theorem 1.19), 


(p— 1)! = (p— 1)(p— 2)---3-2-1= -1 (mod p). 


Now 


It follows that 


(-1)(—2) - ++ (—2n)(2n)(2n —1)-++3-2+1 = (2n)\(-1)2" =—-1 (mod p). 


Thus T = (2n)! has T? + 1 = 0 modulo p, proving the lemma. 


PROOF OF THEOREM 2.6. The case p = 2 is trivial. The case when p is 
congruent to 3 modulo 4 is also dealt with easily; no integer that is congruent 
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to 3 modulo 4 can be the sum of two squares because squares are 0 or 1 
modulo 4. 
Assume that p is a prime congruent to 1 modulo 4. By Lemma 2.7, we can 
write 
ep=T? 4+1=(T+i)(T—-i) in R=Zii] 


for some integers T and c. 

Suppose (for a contradiction) that p is irreducible in R. Then since Z|i] has 
the Fundamental Theorem of Arithmetic, p is prime. Hence p must divide one 
of T +i in R since it divides their product, and this is impossible because p 
does not divide the coefficient of i. It follows that p cannot be irreducible in R, 
so 


p= pw 


is a product of two non-units in R. Taking the norm of both sides shows that 
p? = N(uv) = N(u)N(v). 


This is an equation in Z, so by the Fundamental Theorem of Arithmetic there 
are three possibilities. 


1. N(u) =1 and N(v) = p?, which is impossible since jy is not a unit; 

2. N(v) =1 and N() = p?, which is impossible since v is not a unit; 

3. N(u) = N(v) = p, which must be the case, and this means there is a 
nontrivial solution to the equation x? + y? = p. 


What is being witnessed here is a symbiotic relationship between certain 
Diophantine equations and the structure of an associated ring. To illustrate 
this, we now give a theorem that characterizes the primes of Zi]. 


Theorem 2.8. The primes of R = Z[i] are of three types, 


(1) 1 +i, 
(2) integer primes p= 3 modulo 4, 
(3) factors x + iy of the integer primes p= 1 modulo 4, 


together with all multiples of these types by units. 


Exercise 2.8. Prove Theorem 2.8. (Hint: Show that any prime in Z[i] divides 
a prime in Z.) 


Exercise 2.9. Prove that if a prime p is a sum of two squares, p = a? + b?, 
then this representation is unique (apart from the obvious changes). 


Exercise 2.10. Prove that the positive integer n is a sum of two squares if 
and only if every prime p with p = 3 modulo 4 that divides n does so to an 
even exponent. 
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One of the many classical results of elementary number theory extends Theo- 
rem 2.6 to all integers — at the expense of allowing more squares to be added 
together. Bachet conjectured the result, and Diophantus stated it; Fermat 
may have had a proof. The first published proof was that of Lagrange in 1770, 
which we now present. 


Lemma 2.9. Let p be an odd prime. Then there are integers a and b with 
a’? +b*?4+1=0 (mod p). 


PROOF. Define the sets 


IN 


-1 
A={a|0<a a} 


and 


B={-¥-1jo<e<P rh. 


No two elements of A are congruent modulo p, and no two elements of B are 
congruent modulo p. It follows that each of the sets A and B contains poh 
elements modulo p, so by the pigeonhole principle! there must be an element 
of A that is equal to an element of B modulo p since there are only p distinct 


integers modulo p. Thus there are integers a and b with 


a? +b?+1=0 (mod p) 


as required. 


Theorem 2.10. [LAGRANGE] Every positive integer is a sum of four integer 
squares. 


PROOF. The first step is to note the Euler four-square identity, 


(7 +0 +e? 4 d?)(w? + 274+ y? + 2”) = (aw + ba + cyt dz)? 
+(ax — bw — cz + dy)? 
-(ay + bz — cw — dx)? 
+(az — by + ca — dw)’, 


which may be proved simply by expanding the right-hand side. This identity 
means that the property of being written as a sum of four squares is preserved 
under products. By the Fundamental Theorem of Arithmetic, it is therefore 


' The ‘pigeonhole’ principle states that if (Q +1) letters are placed in Q pigeon- 
holes, one pigeonhole must contain more than one letter. It is readily proved by 
contradiction. 
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sufficient to prove that any prime is a sum of four integer squares. It is clear 
that 2 = 17+ 17+ 0?+0? is a sum of four integer squares, so it is enough to 
prove that any odd prime is a sum of four integer squares. 

Let p be an odd prime. By Lemma 2.9, there are integers a,b,c,d and m 
with 

mp=a+h+e+d’. (2.4) 

If m = 1 then we are done, so assume that m > 1. The proof proceeds by 
finding an expression for m’p as a sum of four squares, with 0 < m’! < m. This 
can be repeated, reducing the size of m each time, until we eventually must 
find an expression for the prime p itself as a sum of four squares. 

Now notice that if an even integer 2n is a sum of two squares, 2n = x7 +y?, 
then the integers x and y are either both even or both odd. It follows that the 


expresses n as a sum of two integer squares. Returning to Equation (2.4), if m 
is even, then either none, two, or four of the numbers a, b, c,d are even. Thus 
we can use Equation (2.5) twice to deduce that (4)p is a sum of four squares. 
In this case we have halved the size of m. 

If m is odd, write 


and 


wtartyt22=0 


It follows that 
w+ertyt+227=km 
for some k, 0 < k < m. Now in Euler’s four-square identity 
(a? +b? +c? 4+ d?)(w? + 2? + y? + 27) = (aw + be + cy 4+ dz)? 
+(ax — bw — cz + dy)? 
+(ay + bz — ew — dz)? 
+(az — by + ca — dw)? (2.6) 


the left-hand side is km?p. By our choice of w,2,y,z we see that az = bw 
and dy = cz modulo m, so (ax — bw — cz + dy)? is divisible by m?. A similar 
argument shows that 
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(ay + bz — cw — dx)? 
and 
(az — by + cx — dw)? 
are also divisible by m?. For the first term, 
aw+br+cytdz=wu*?t+a?+y?+22=0 (mod m), 


so the right-hand side of Equation (2.6) is divisible by m?. It follows that the 
identity (2.6) can be divided through by m?, resulting in an expression for kp 
as a sum of four squares, with 0 << k <m. 

Repeating this reduction a finite number of times will reduce m to 1, 
resulting in an expression for the odd prime p as a sum of four squares, 
completing the proof. 


Exercise 2.11. *[LEGENDRE] Show that every integer not of the form 
A" (8k + 7) 
is a sum of three integer squares. 


Exercise 2.12. Suppose a prime p is a sum of four squares. Is it true that 
the representation is unique? What if p is a sum of three squares? 


2.4 Siegel’s Theorem 


In this section, we show how a direct application of the Fundamental Theo- 
rem of Arithmetic in rings that are larger than the integers, for example the 
Gaussian integers Z[i], can yield all the integral solutions to certain cubic equa- 
tions. In the first example, we use the Fundamental Theorem of Arithmetic 
only in Z. 


Theorem 2.11. The only integral solution of the equation 
yours (2.7) 
isx=O0,y=0. 


ProoF. Let x and y be integers with y? = x? + x. Write the right-hand side 
of the equation as x? + a = x(a? +1). Any factor of x will divide x?, so any 
factor common to x and x? + 1 will also divide 1. Thus x and x? +1 must be 
coprime and hence, by the Fundamental Theorem of Arithmetic, both must 
be squares (since their product is y?). Writing z? = a? + 1, we see that 
l= 2? =27 = (2+ 2)(2—2). 

By the Fundamental Theorem of Arithmetic in Z, (z+ x) and (z — x) must 
both be 1 or both be —1. 

Solving for x and z shows that x = 0 in both cases. 
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Theorem 2.12. The only integral solution of the equation 
yor —l (2.8) 
isx=l,y=0. 


For this equation, it looks as if we should factorize the right-hand side 
over Z, but it does not seem easy to get to the proof that way. Instead we 
factorize over a bigger ring that is also known to satisfy the Fundamental 
Theorem of Arithmetic. 


PROOF OF THEOREM 2.12. Rewrite the equation as 
y" 4 1= a? 


and then factorize the left-hand side as (y+i)(y—i) in Z[i]. We claim that the 
two factors y +i must be coprime. To see why, let 6 = gcd(y+i, y —i); 6 must 
divide the difference y +i — (y— i) = 2i. However, we claim that no factor 
of 2 can divide y +i. This is because x must be odd; if x is even then x? = 0 
modulo 8, which means that y? + 1 = 0 modulo 8 and this congruence has 
no solutions. We deduce that 6 must be a unit, and the two factors y +i are 
coprime in Z[i]. 

Applying the Fundamental Theorem of Arithmetic in Z[i], we deduce that 
each factor y +i, y—i must be a unit multiple of a cube. Since all units are 
themselves cubes, we deduce that each of y +i is a cube in Z[i], so assume 


yti=(atbi)?,a,b€Z. 
Equating imaginary parts gives 
1 = 3a7b — B® = b(3a? — Bb”). 


By the Fundamental Theorem of Arithmetic in Z, the solutions are greatly 
restricted: b = (3a? — b?) = +1. If b= 1, then 3a? —1 = 1, which is impossible 
as no integer a has 3a? = 2. The only alternative is b = —1, in which case a = 
0, yielding the unique solution y = 0 and « = 1. 


Exercise 2.13. Use the preceding method in the ring Z[/—2] to prove that 
the only integral solutions of 


y? =a? —2 


are © = 3, Y= +5. 


Later we will be thinking of the set of solutions to equations such as these 
geometrically, so we will describe the solutions as points (x,y) in the plane. 
Now consider the example 


y=2 —38. (2.9) 
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Experimentation with small integers suggests that there will be no integral 
solutions, but we encounter a difficulty when we try to prove this using the 
preceding methods. The reason is that the Fundamental Theorem of Arith- 
metic does not hold in the ring Z[,/—3] (see Exercise 3.17 on p. 73.) On the 
other hand, the Fundamental Theorem of Arithmetic does hold in the bigger 
ring Z[w], where w = e?7'/3 is a nontrivial cube root of unity. 

This suggests that we might try to find all the solutions (x,y) over the 
ring Z[w] as a precursor to finding all the solutions over the smaller ring Z. 
This might seem audacious but historically this is just what happened in the 
general case. 


Theorem 2.13. [SIEGEL’S THEOREM] Suppose a,b,c € Q. Then there are 
only finitely many integer pairs (x,y) with 


y =a? +az?+br +c, (2.10) 
provided the cubic polynomial x? + ax? + ba + c has no repeated zeros. 


This theorem will not be proved here — see the notes at the end of the 
chapter for references where complete proofs may be found. The curve de- 
scribed by an equation of the shape Equation (2.10) is known as an elliptic 
curve provided the right-hand side has no repeated zeros. In order for Siegel’s 
Theorem to hold, some condition about the cubic polynomial is clearly needed 
because, for example, the equation y? = x? has infinitely many integral solu- 
tions. We will devote considerable space to studying the remarkable properties 
of elliptic curves. 


Exercise 2.14. Prove that the polynomial x? + ax + b has no repeated zero 
if and only if 4a? + 27b? 4 0. 


The genius of people such as Siegel is that they are willing to take an 
imaginative step up from particular cases, and are in addition able to supply 
the guile needed to complete the proof. In fact, he gave two different proofs of 
Theorem 2.13. In his second proof Siegel showed that there are only finitely 
many solutions (x,y) with x and y lying inside a suitably large ring con- 
taining Z in which the Fundamental Theorem of Arithmetic holds. The rings 
in which Siegel proposed to work typically contain infinitely many units, in 
contrast with the integers Z. We can appreciate some of the technical diffi- 
culties he had to overcome by considering the techniques that went into his 
second proof in some special cases. His second proof turned out to be very 
important: He first reduced the given equation to a finite number of linear 
equations over a finitely generated group. Subsequently, methods were devel- 
oped in Diophantine Approximation that applied to these linear equations and 
allowed, ultimately, a practical method for finding all the integral solutions of 
the equation in Theorem 2.13. 
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Exercise 2.15. Fix a square-free integer d > 1, and assume that Z[Vd] sat- 
isfies the Fundamental Theorem of Arithmetic. Show that the equation 


y=a2°+d 


has only finitely many integral solutions. (Hint: You may assume that the 
units of the ring Z[Vd] are all of the form tu” for some unit u > 1.) 


The rings Siegel worked with are obtained by inverting certain chosen 
primes. This technique provides us with a new class of rings to study. As a 
simple illustrative example, let S denote the set {2} and let Zs denote the 
ring Z[5] consisting of all rational numbers with a denominator consisting of 
a power of 2. Given any nonzero q € Q, write q = 2"q’, where r € Z and 
the numerator and denominator of gq’ are odd. Define the S-norm of q to 
be |g|s = |q’|. The ring R has infinitely many units, consisting of the rational 
numbers +2* for k € Z. The ring R is sometimes called the ring of S-integers 
of Z, and its units are known as S-units. 


Exercise 2.16. Prove that the ring Zg is a Euclidean ring with respect to |.|g. 


The next exercise will provide a further illustration of some of the tech- 

niques needed to prove Siegel’s Theorem. We have already seen examples 
where the Fundamental Theorem of Arithmetic fails in some quadratic rings. 
We overcame that failure in Z[\/—3] by working in the bigger ring R = Z[w, 
where w is a nontrivial cube root of unity. Letting S = {2} as before, R is a 
subring of an even bigger ring Rg = Z[V—3, 3]. 
Exercise 2.17. Define a norm function on Rg = Z[V—3, 4] with the property 
that Rg is a Euclidean ring. Find all solutions to Equation (2.9) in the ring Rg. 
Again, this exercise shows there are only finitely many solutions to a specific 
cubic equation in a ring with infinitely many units. 


Theorem 2.14 below is quite deep and we will not prove it. The proof 
requires Theorem 4.14 from Chapter 3. The notes at the end of the chapter 
reference a proof in the literature. It shows that the Fundamental Theorem 
of Arithmetic in Z[V/d] can be recovered by inverting a finite list of primes. 


Theorem 2.14. Let d be a nonsquare integer. There is a finite list of primes 


P1s-+++>Pr 


with the property that Z|Vd, ri kia pal has the Fundamental Theorem of 
Arithmetic. 


Combining the techniques learned thus far allows a special case of Siegel’s 
Theorem to be proved. An integer is called square-free if it is not divisible by 
the square of any integer greater than 1. 
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Exercise 2.18. Suppose d < 0 is a square-free integer with the property 
that Z[Vd, | has the Fundamental Theorem of Arithmetic for some prime p. 
Show that the equation 

yr =a4+d (2.401) 


has only finitely many integral solutions. 


When explicit approaches such as this succeed, they allow the determina- 
tion of all the integral solutions. Determining all the integral solutions pre- 
dicted by Siegel’s Theorem is generally quite a difficult problem and requires 
powerful methods from transcendence theory. It was not until late in the twen- 
tieth century that these methods were sufficiently well advanced to allow for 
a practical method of solving a given equation. 

An S-unit equation is one of the form 


A,X, + +++ + 4n%p = 1 


with a, fixed constants in some field K, and the solutions x; are sought in 
a finitely generated subgroup of K*. For the cubic equations studied here, 
Siegel reduced the problem of finding all the integral solutions to finding the 
solutions of a finite number of S-unit equations all having n = 2. He then 
showed that such an equation has only finitely many solutions. In general S- 
unit equations turn out to lie behind many other Diophantine equations and 
they have come to be studied as important in their own right. 


2.5 Fermat, Catalan, and Euler 


Finally we mention three famous Diophantine problems, all of which have 
recently been solved. There are detailed references in the notes at the end of 
the chapter. 


2.5.1 Fermat 
Fermat’s Last Theorem, now proved by Wiles, states that the equation 
ee +y"=2", n>, (2.12) 


has no nontrivial solutions. (A solution is trivial if one of x,y or z is zero.) 
Clearly, it is only necessary to prove this in the case when n = p is a prime. 
A startling aspect of the solution is that it depends on deep results concern- 
ing the arithmetic of elliptic curves: If a? + b? + c? = 0 for a prime p and 
integers a,b,c, then the elliptic curve with equation 


y? = x(x — a?)(x + dP) 


turns out to have properties that Wiles was able to show were impossible. We 
will be studying the arithmetic of elliptic curves in Chapters 5 and 6. 
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Exercise 2.19. *Prove that Equation (2.12) has no nontrivial solutions with n 
equal to 3,4, or 5. 


Exercise 2.20. *Prove that Equation (2.12) has no nontrivial solutions in 
Gaussian integers with n = 4. 


Exercise 2.21. *Prove that Equation (2.12) has no solutions x, y, z in posi- 
tive integers with n a Gaussian integer. 


2.5.2 Catalan 
The Catalan equation is 
u—v¥=1, u,v,7,yEN, u,v,27,y > 2. (2.13) 


A solution is 3? — 2° = 1; the Catalan problem is to show that there are no 
others, and this has recently been proved. 


2.5.3 Euler 


Euler conjectured that an nth power cannot be written as the sum of fewer 
than n nontrivial nth powers for n > 3. Lander and Parkin made a computer 
search for nontrivial solutions to the Diophantine equation 


n 
Soa? =y, n<6. 
i=1 


Among the solutions, they found a counterexample to Euler’s conjecture 
for n = 5. Their resulting announcement matches the famous seminar of Cole 
described on p. 27 for its brevity and drama: The entire text of their paper is 
as follows. 


“A direct search on the CDC 6600 yielded 
27° + 84° + 110° + 133° = 144° 


as the smallest instance in which four fifth powers sum to a fifth 
power. This is a counterexample to a conjecture by Euler [see L. E. 
Dickson, History of the theory of numbers, Vol. 1, p. 648, Chelsea, 
New York, 1952] that at least n nth powers are required to sum to 
an nth power, n > 2.” 


In addition, it was shown that the case n = 4, namely the Diophantine equa- 
tion 
ue tot + wt = ot, (2.14) 


has no solutions in positive integers with x < 220000. 
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In a dramatic development, Elkies used a mixture of sophisticated theory 
and a computer search to find a solution to Equation (2.14), 


26824404 + 153656394 + 187967604 = 20615673+. (2.15) 


Following this, Roger Frye found that the minimal solution to Equation (2.14) 
is 
958004 + 2175194 + 4145604 = 4224814 


and showed that there are no other solutions with u< v < w < x < 1000000. 


NOTES TO CHAPTER 2: Much of the material in this chapter is part of algebraic 
number theory. Stewart’s book [147] is an accessible introduction at this level; for 
more advanced treatments, see the books of Hasse [76], Janusz [83] or Lang [96]. A 
sophisticated text on related topics is Serre’s classic book [137]. Barbeau’s book [10] 
discusses Pell’s equation in detail and requires very little background. A proof of 
Theorem 2.14 can be found in Lang [96, Chapter I, Proposition 17]. The seminal 
finiteness results on S-unit equations mentioned at the end of Section 2.4 may be 
found in the papers of Evertse [60], Schlickewei [134], and van der Poorten and 
Schlickewei [120]. These results have found wide application; a surprising connec- 
tion to ergodic theory is shown in a paper of Schmidt and Ward [135]. For attractive 
accounts of Fermat’s Last Theorem, see the popular accounts of Ribenboim [126] 
and van der Poorten [119]; a serious introduction at a high level to the mathematics 
behind Wiles’ extraordinary proof [162] may be found in the proceedings [35] of an in- 
structional conference edited by Cornell, Silverman and Stevens. Exercise 2.20 comes 
from a short note by Cross [38]; Exercise 2.21 comes from a paper of Zuehlke [169] 
and uses some transcendence theory. The Catalan problem Equation (2.13) was ini- 
tially reduced to a finite calculation and then solved completely by Mihailescu; see 
the paper of Metsankyla [106] for an account and the monograph [58, p. 159] by 
Everest, van der Poorten, Shparlinski and Ward for an overview of related ques- 
tions. An accessible account of the Catalan problem before its final solution may be 
found in the book of Ribenboim [124]. The results of Lander and Parkin appeared 
in their paper [92]; their dramatic announcement quoted in Section 2.5.3 is [91]. 
The state of Euler’s problem in 1967 is surveyed in a paper of Lander, Parkin and 
Selfridge [93]. Equation (2.15) of Elkies is in [49]. 
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Quadratic Diophantine Equations 


Attempts to go beyond the Pythagorean Diophantine equation quickly lead 
to general questions about quadratic Diophantine problems. Apparently sim- 
ple questions seem to require an excursion into the theory of finite fields. 
For example, we prove that any finite field has a primitive root in order to 
develop the classical theory of the Legendre symbol and the Quadratic Reci- 
procity Law. Some general theory of quadratic rings and quadratic forms is 
established, up to the finiteness of the class number for quadratic forms. 


3.1 Quadratic Congruences 


Suppose we now seek to generalize our earlier results and understand the 
Diophantine equation 

z+ 2y’ =p (3.1) 
when p is a prime and z and y are integers. We can do this by using properties 
of the ring Z[/—2], but we also need a better understanding of the arithmetic 
of the integers modulo p when p is a prime. 


Exercise 3.1. Let R = Z[/—2]. 
(a) Show that the function N : R > N defined by 


N(a+ yV—2) = 2? 4+ 2y? 


satisfies N(a3) = N(a)N(@) for alla, Ge R. 
(b) Determine all the units in R. 
(c) Show that R is Euclidean with respect to N. 


Following our earlier method, we now expect to use unique factoriza- 
tion in R together with some knowledge of congruences to understand Equa- 
tion (3.1). The relevant congruence to study for this equation is 


T?+2=0 (mod p). (3.2) 
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Exercise 3.2. Compute the list of primes p < 1000 for which the congru- 
ence (3.2) has a solution with T € Z. 


It is becoming clear that we need some tool that will guarantee the ex- 
istence of a solution for certain congruences and rule out a solution for oth- 
ers. For example, your computations in Exercise 3.2 should suggest that for 
primes p = 1 or 3 modulo 8 there is a solution, while there is no solution for 
primes p = 5 or 7 modulo 8. (The prime p = 2 does give a solution.) Our 
earlier approach suggests that the area we need to look at is the arithmetic 
of Z/pZ. Previously we used al-Haytham’s Theorem in a crucial way, and here 
we have no obvious analog. It turns out that the property we need is directly 
related to a natural concept in group theory. 


Definition 3.1. An element a of Z/pZ is a primitive root modulo p if the 
powers of a generate all the nonzero residues modulo p. 


Example 3.2. It is easy to prove that the powers of 2 yield all the nonzero 
residues modulo 5: 2° = 1, 2! = 2, 2? = 4, 2° = 3 modulo 5. Thus 2 is a 
primitive root modulo 5. Similarly, 3 is a primitive root modulo 7, but 2 is 
not since no power of 2 is congruent to 3 modulo 7. 


The set of residues modulo p forms a field: The existence of a primi- 
tive root a modulo p is the same as the statement that the multiplicative 
group (Z/pZ)* of the field Z/pZ is cyclic, generated by a. We will use freely 
other equivalent ways of saying this. If G denotes a finite Abelian group with n 
elements, written multiplicatively, then a generates G if and only if any of the 
following equivalent conditions hold: 


la™=1l<m<n = m=n; 

2. the order of a is n; 

3.a7%=1l<m => n|m. 

Theorem 3.3. The multiplicative group of any finite field is cyclic. 


This is an important result, and we will spend some time proving it. When 
we have done this, we can return to our equations. The proof of Theorem 3.3 
involves an important example of an arithmetic function. 


Definition 3.4. An arithmetic function is any function f : N > C. An arith- 
metic function with f(1) 40 and 


f(mn) = f(m)f(n) 
whenever m and n are coprime is called multiplicative. (Note that this im- 


plies (1) = 1.) If f has this property not only for coprime m,n, but for 
allm,n EN, then f is called completely multiplicative. 
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Multiplicative arithmetic functions will be discussed further in Section 8.2. 
One of the most important arithmetic functions is 


o(n) =|{1 <a<n| ged(a,n) = 1}, 
called the Euler phi-function. 
Exercise 3.3. Let p be a prime. Show that ¢(p°) = p®1(p—1) for any e > 1. 
Lemma 3.5. The Euler phi-function is multiplicative. 


We will postpone the proof slightly to note an immediate corollary of 
Lemma 3.5 and Exercise 3.3. 


Corollary 3.6. [fn is factorized into powers of distinct primes, n = i p?, 
then i 
= p- 
o(n) = ]]@- dp =n] —. 
pin pln 


Exercise 3.4. Give an example to show that ¢ is not completely multiplica- 
tive. 


Exercise 3.5. (a) Find all values of n € N with ¢(n) = $n. 

(b) Find all values of n € N with ¢(n) = ¢(2n). 

(c) Find all six values of n € N with ¢(n) = 12. 

(d) Find the smallest n € N for which 2@ < 1. 

(e) Find a sequence of integers (n,;) for which $73) _, Q as jo. 


The proof of Lemma 3.5 depends on the following result. 


Theorem 3.7. [CHINESE REMAINDER THEOREM] Suppose m,n € N are co- 
prime. Then the simultaneous congruences 


(mod m), 


a 
x=b (mod n), 


have a solution « € N for any a,b € Z, and the solution is unique modulo mn. 


The Chinese Remainder Theorem was discovered by Chinese mathemati- 
cians in the fourth century A.D. The first appearance seems to have been in 
a work of Sun-Zi, and a general treatment was given by Qin! Jiushao. Special 


' Also transliterated as Ch’in Chiu-Shao. Jiushao seems to have been both a rogue 
and a mathematical genius. His work Shushu Jiuzhang (Mathematical Treatise 
in Nine Sections) appeared in 1247 and contained many important and novel 
results and methods. The so-called Chinese Remainder Theorem is among these, 
attributed to experts in astronomy and calenders. It has been suggested that the 
theorem does not bear his name because in the form ‘Chin’ it was too easily 
confused with ‘Chinese’. 
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results of the same sort were used by Fibonacci in Italy and al-Haytham in 
Iraq. We will see it again in Chapter 12 (see p. 256) in greater generality. 
PROOF OF THE CHINESE REMAINDER THEOREM. The coprimality condition 
guarantees that there exist m’,n’ such that 


mm!’ =1 (mod n) and nn’=1 (mod m) (3.3) 


by Corollary 1.25. Then x = bmm/’ + ann’ satisfies both the required congru- 
ences. 

If, on the other hand, x and y satisfy both congruences, then (a — y) is 
divisible by m and by n. Since m and n are coprime, (x — y) must be divisible 
by mn. 


Example 3.8. Solve the simultaneous congruences x = 2 modulo 17 and « = 8 
modulo 11. We find m’ = 2 and n’ = 14 in the proof of the Chinese Remainder 
Theorem. Then 

xv =8-(17-2)+2-(11-14) = 580 


satisfies the two congruences. (The smallest solution is the remainder of 580 
divided by 11-17, namely 19.) 
PROOF OF LEMMA 3.5. Let m and n be coprime. Define a map 
@: Z/mnZ > Z/mZ x Z/nZ 
by 
xt> (x (mod m), x (mod n)), 


where we think of the elements of Z/mnZ as {0,1,...,mn—1}. By the Chinese 
Remainder Theorem, is a bijection. (In fact, ® is an isomorphism of rings.) 
Now define 

(Z/nZ)" = {1 <a <n: ged(a,n) =1} 


and likewise for n and mn. Since x is coprime to mn if and only if it is coprime 
both to m and n, @ restricts to these subsets: 


® : (Z/mnZ)* > (Z/mZ)* x (Z/nZ)". 


Here @ is still a bijection. (In fact, the set (Z/kZ)* is the set of units U(Z/kZ) 
of Z/kZ, and @ is an isomorphism of (multiplicative) groups.) By definition, 
the cardinality of (Z/mZ)* is just ¢(m) and likewise for n and mn, which 
completes the proof of Lemma 3.5. 


The next exercise is a generalization of Fermat’s Little Theorem (Theo- 
rem 1.12), called the Euler-Fermat Theorem. 


Exercise 3.6. Given n > 1 in N, show that for any a € Z with gcd(a,n) = 1 


a®%™ =1 (mod n). (3.4) 
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Exercise 3.6 is a pretty standard one found in most texts that deal with 
the ¢-function. It is a good test case for our earlier remarks about how different 
approaches can yield different benefits. It is possible to prove Equation (3.4) 
using congruences modulo p” for each prime power p” dividing n, together 
with the Binomial Theorem. Another, slicker, proof simply uses Lagrange’s 
Theorem on the group U(Z/nZ) = (Z/nZ)*. 


Theorem 3.9. For any n€N, 


S2 (a) =n. (3.5) 


d|n 


PROOF. First check the equality when n = p” is a prime power. The left-hand 
side is 


by summing the geometric progression or noticing that it is a telescoping sum. 
Next, observe that both sides of Equation (3.5) are multiplicative arithmetic 
functions. For the left-hand side, this follows from 


S~ 6(d) = S* So b(didz) = S> o(di) > 6(d2) 
d|mn d,|m dg|n d,|m dg|n 


for any pair of coprime integers (m,n). Note that d divides mn if and only if 
there exist divisors d, of m and dz of n such that d = djdz, so it is enough to 
check the prime power case. 


We can now prove Theorem 3.3. In the proof, we will be working with 
a general finite field. Such a field can always be explicitly presented using 
polynomials; however, nowhere will we need an explicit presentation. This 
suggests that more abstract methods might also be applicable to prove the 
theorem. Indeed, a proof can be given that only uses the theory of finite 
Abelian groups. 
PROOF OF THEOREM 3.3. Let F be a finite field with q elements. We are 
going to prove that if g is any element of F*, then g’ has the same order as g 
if and only if ged(j,q — 1) = 1. This will allow us to find how many elements 
there are of each order, showing in particular that there are ¢(q — 1) distinct 
generators in total. 


Example 3.10. The distinct powers of 3 in F? are 


PS 3S 33S 2 SH 6 rH a So: 


The only values of j, 1 < j < 6 with gcd(j,6) = 1 are 1 and 5. Since 3° = 5 
modulo 7, 5 is another generator of F%. Similarly, Fj, = (2) (the mul- 
tiplicative group generated by 2). The values of j between 1 and 10 for 
which gcd(j, 10) = 1 are 7 = 1,3,7,9 so there are four possibilities for gener- 
ators of Ft,, namely 2! = 2,2? = 8,27 =7,29 =6. 
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Exercise 3.7. Prove that in any field, a polynomial of degree d has no more 
than d zeros. (Hint: Use Exercise 2.5 on p. 47). 


Returning to the proof of Theorem 3.3, suppose d|(q — 1) and a is an 
element of F* of order d (if one exists). Then 


a? = 1 in F and a™ =1 with 0 < m < d implies m = 0. 


The elements 1, a,a?,--- ,a?~! are all distinct, otherwise a’ = a) would imply 


that a* = 1 with some 0 < k < d. We claim that if an element a of order d 
exists, then the other elements of order d in F* are precisely those powers a/ 
with 1 <j < dand gcd(j,d) = 1. Thus if there is an element of order d, then 
there will be precisely ¢(d) of them. If a does have order d, then the only other 
elements of order d must lie among the powers a/ above since any element of 
order d satisfies the equation 


z?—1=0 


in F, this equation has at most d roots by Exercise 3.7, and each of the 

powers a/,0 < j < d satisfies the equation. Thus all the elements of order d 

must lie among these powers. But which of the powers have order d? We now 

prove our claim that a/ has order d, 1 < j < d, if and only if ged(j,d) = 1. 
If 1 < ged(j,d) =d’ < dthen 1 < d/d' < d and 


(ai)u/e = (at) s/¢ = 15/4 =1, 


so a? does not have order d (since d/d’ < d). 
Conversely, suppose that gcd(j,d) = 1 and a? has order d” with 


l<d'<d. 


Then a” = 1, so d|jd" since a has order d. However, gcd(d, 7) = 1, which 
forces d|d’. On the other hand, d” < d, so we must have d = d”. This 
completes the proof that a’ has order d if and only if gcd(j, d) = 1. 

Each of the (q — 1) elements of F* has order dividing (q — 1), so by Theo- 


rem 3.9, 
d oa) =4-1. 
d|(q-1) 
Thus, for every d dividing (q — 1), we must have ¢(d) elements (not none) of 
order d. In particular, we have ¢(q — 1) > 1 elements of order (gq — 1). 


Notice that we have proved a little more than Theorem 3.3. The proof 
shows how many elements of F, there are of each possible order, finding in 
particular that there must be at least one element of order (q — 1), which is 
therefore a primitive root. 


Exercise 3.8. Verify that 2 is a primitive root for the prime p = 19. Find all 
the elements of order 6 under multiplication modulo 19, expressed as integers 
between 1 and 18. 
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Despite the seemingly complete knowledge provided by the proof of The- 
orem 3.3, several closely related questions turn out to be extremely difficult. 
The following is a famous conjecture of Artin which remains an open problem. 


Conjecture 3.11. [ARTIN] Any integer that is not a square or —1 is a primitive 
root modulo p for infinitely many primes p. 


An apparently less ambitious question is to ask, given an explicitly pre- 
sented finite field, whether there is an algorithm for determining a primitive 
root. For example, if p is a given prime, can we determine a primitive root 
for p? The most obvious thing to try is checking the integers 2,3,5,6... (not 4 
of course!) in the hope that a primitive root will soon be found. Thus one seeks 
an upper bound on the smallest primitive root, and this too is difficult. The 
smallest primitive root modulo p can be shown — conditionally — to be bounded 
by a constant multiple of (log p)®, a result of Shoup from 1992. However this 
result relies upon a hard unproven hypothesis stated in Section 12.7.1. This 
might not sound very satisfactory, but it turns out to have great practical 
value. 


3.2 Euler’s Criterion 


Many problems concerning quadratic congruences can be reduced to solving 
the simplest such congruence, namely x? = a modulo p for a prime p and 
given a. 


Definition 3.12. Let p be an odd prime and a an integer. The Legendre 
symbol is defined by 


* 0 if pla, 
(2) See if pla and x2 =a (mod p) has a solution, 
P —1 otherwise. 
If a~0 and Cy) = —1 then a is a quadratic nonresidue modulo p; other- 


wise a is a quadratic residue modulo p. 

Some elementary properties of the Legendre symbol will be used without 
comment. In particular, if a = 6 modulo p, then (4) = (2) and (<) = 1 for 
any a # 0. 


Theorem 3.13. [EULER’S CRITERION] Let p be an odd prime. Then 


(<) =a-/2 (mod p). (3.6) 
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PROOF. The statement is obvious if a = 0 modulo p, so assume that a is 
coprime to p. Notice that the only square roots of 1 modulo p are congruent 
to +1 since 2? — 1 = (a — 1)(~ +1) in any field. 

Now 


so 


=+1 (mod p). 


Let g denote a generator of the cyclic group (Z/pZ)*. Then a = g? modulo p 
for some 7, and a is a quadratic residue if and only if 7 is even. Suppose a is 
a quadratic residue, so 7 = 27’ for some integer 7’. It follows that 


gP—-V/2 = (gi)(@-D/2 — gi (P-1) = (gP-1)F" = 1 (mod p). 


Thus (5) = 1 implies that a'P—)/2 = 1 modulo p. 
Conversely, if a®-!)/2 = 1 modulo p, then gi(?-))/? = 1 modulo p. How- 
ever, g has order (p — 1) modulo p, so 


(p — 1)|j(p — 1)/2, which implies 2(p — 1)|j(p — 1). 


Canceling (p — 1) from both sides shows that j is even. Thus a(?—)/? = 1 


modulo p implies that ($) = 1. 


Corollary 3.14. The Legendre symbol satisfies 


ee. 


That is, the Legendre symbol viewed as an arithmetic function 


(<) :Z— {0, a 
Pp 
is completely multiplicative. 
The proof follows immediately from Theorem 3.13 because the right-hand 
side of Equation (3.6) is completely multiplicative. 


eR 


} 


Exercise 3.9. Suppose that p,q > 0 are odd primes with g = 4p+ 1. Prove 
that 2 is a primitive root modulo g. It follows that Artin’s conjecture (on p. 65) 
for a = 2 would be proved if we knew there are infinitely many primes q of 
the form 4p + 1 where p is a prime. 


Exercise 3.10. Prove Corollary 3.14 using concepts from group theory. (Hint: 
The set of squares in the group G = (Z/pZ)* forms a subgroup. The index of 
this subgroup in G is of order 2 if p is odd; see Exercise 3.12 below.) 
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3.3 The Quadratic Reciprocity Law 


The main result on quadratic residues is a reciprocity law. Gauss did many 
calculations with quadratic residues and in particular studied whether there 
might be a relation between p being a quadratic residue modulo gq and q being 
a quadratic residue modulo p when p and q are primes. Based on his extensive 
calculations, he conjectured and then proved (in several ways) the following: 
When one of p or q is congruent to 1 modulo 4, either both of the congruences 


a*=q (modp), y°=p (mod q), 


are solvable or both are not. If both p and gq are congruent to 3 modulo 4, 
then one is solvable if and only if the other is not. This surprising result is of 
great importance. 


Theorem 3.15. Let p and q denote odd primes. If p=q=3 modulo 4, then 


()--@ 
Pp q 
If at least one of p or q is 1 modulo 4, then the symbols are equal. 


Theorem 3.15 can be stated as a neater formula, and this is what we will 
prove. If p and q are odd primes, then 


(2) = (-1)0-/24-D/2 (2). (3.7) 


The even prime 2 has to be treated separately: The theorem below will be 
proved on p. 68. 


Theorem 3.16. If p is an odd prime, then 


(=) = 1 ¢f and only ifp=+1 (mod 8). 


Exercise 3.11. (a) Show that Theorem 3.16 can be written in the form 


(=) = (-1)@*-v/8 
Pp 
for an odd prime p. 


(b) Prove that a) = (-1)@-Y/2, 
Exercise 3.12. A Diophantine equation with solutions in Z must have solu- 
tions modulo p (that is, in Z/pZ) for all primes p. 

(a) Show that the converse does not hold by proving that 


(x? — 2)(2? — 3)(a? — 6) =0 


has a solution modulo p for every prime p but no integral solution. 
(b) Show that a — 16 = 0 has a solution modulo p for every prime p but no 
integral solution. 
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Exercise 3.13. (a) Show that Equation (3.7) is equivalent to Theorem 3.15. 
(b) Show that if p is an odd prime, then 


—2 
(=) = 1ifand only ifp=1lor3 (mod 8). 
Pp 


(c) Use the arithmetic of Z[,/—2] to show that the prime p can be written 
p=2’ 4+ 2y’, with z,y € Z, 
if and only if p= 1 or 3 modulo 8. 


Exercise 3.14. (a) Show that if p > 3 is a prime, then 
—3 ; : 
— }=1ifand only ifp=1 (mod 3). 
Pp 


(b) Show that the map x > 2° +2 is a bijection on Z/pZ for any odd prime p 
congruent to 2 modulo 3. Deduce that the equation(x? + 3)(a* +2) = 0 has 
a solution modulo q for any prime qg but has no integral solutions. 


Exercise 3.15. *Show that a monic polynomial f € Z[a] of degree 4 or less 
that has a solution modulo q for every prime q has an integral solution. 


The proof of Theorem 3.16 acts as a dummy run for the proof of Theo- 
rem 3.15. The proofs given here are due to Serre. 


PROOF OF THEOREM 3.16. The prime p is odd, so p?—1 = 0 modulo 8. Let F 
denote the field with p? elements. Then F* is a cyclic group of order p? — 1 
by Theorem 3.3. Since p? — 1 is divisible by 8, this implies that F* contains 
an element of order 8. Let ¢ denote such an element. Let 


GCG CG (3.8) 


Now (¢*)? = ¢8 = 1, so ¢+ = —1 (¢ has order 8, so we cannot have ¢+ = 1) 
and therefore 
(= —C and ¢7 = -¢°. 
Therefore 
G=2(¢-¢°), 
so G? = 4(¢—¢3)? = 4(¢? + C6 — 2¢*). But ¢4-+1 = 0 implies that (6+ ¢? = 0. 
Therefore 
G? = 8. 

Recall that we are working in the field F so that 8 denotes not only the 
integer 8 but also the sum lp + --- + 1p (seven additions), where lp is the 
multiplicative identity in F*. 

The proof of the theorem depends on finding two distinct expressions 
for GP. 
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First expression for G?: 
G? = GG?" 
= Gaye? 
= G8'?-))/? because G2 = 8 
8 
=G (=) by Euler’s criterion 
2 
=G (=) by Corollary 3.14. 
Pp 
Second expression for G?: 


Define a function f : Z —> {0,+1} to be 0 when j is even and (—1)"-)/8 
when j is odd. Notice that 


f(j) =1 if and only if 7 =+1 (mod 8). 


The second expression for G? is 
G? = f(p)G. (3.9) 


Equate the two expressions for G? to obtain 


Now G is not zero in F (because G? = 8), so cancelling gives 


2 
(2) = f(p) =1if and only ifp=+1 (mod 8). 
Pp 


The field F has characteristic p, so (a + b)? = a? + bP in F because all 
binomial coefficients apart from the end ones are divisible by p. (A similar ar- 
gument was used in the proof of Fermat’s Little Theorem on p. 24.) Similarly, 
by induction, 

(ai +--+ +an)? =al +--+ +a. 


Using Equation (3.8) and the definition of f, 


G= f(IC+ (3) + FH)C + FTC 
= f+ fUC+ FAC +---+F(C7 


Thus : 
c= (LHe) = sae”. (3.10) 
j=0 j=0 


Note that f(j) does not need to be raised to the power p because f(j)? = f(j). 
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Lemma 3.17. For all j € Z, f(p)f(jp) = fd). 
Assuming this lemma for the moment, Equation (3.10) gives 
7 7 
= FNC? =>- (Ofc? =F) S_fGnc*. 
j=0 j=0 j=0 


Now, for fixed p, jp modulo 8 runs through 0,...,7 as 7 does, so this shows 
that G? = f(p)G, proving Equation (3.9). 
All we need to do now is prove Lemma 3.17, which states that 


f(p) f (pi) = f(s): 


Clearly, this is true if 7 is even, so suppose that 7 is odd. The statement is true 
for any odd pair j and p. This can be checked by examining all the possibilities 
for j and p modulo 16. Alternatively, notice that 


(—1)(GP)?-)/8 — (1) (Gr)?—-v* +p? -1)/8 
((-2)?")@?-D/8(_)@*-0/8 


I 


_1)?-1)/8(_1)@?-1)/8 
(—1) (—1) 


I 


This shows that f(jp) = f(j)f(p), and Lemma 3.17 follows by multiplying 
both sides by f(p) (whose square is 1). 


Finally, we come to the proof of the Quadratic Reciprocity Law (Theo- 
rem 3.15). Theorem 3.3 will again play a pivotal role. 


PROOF OF THEOREM 3.15. Consider the field F with p!—! elements. Then F* 
is a cyclic group with order p?~' — 1 by Theorem 3.3. By Fermat’s Little 
Theorem, p?~! = 1 modulo q. Thus there is an element ¢ in F* whose order 


is q. Define 
q-1 5 
G= (2) CA, (3.11) 


The sum G is called a Gauss sum because Gauss seems to have been the first 
person to systematically study sums such as these. 
The proof works as before by finding two different expressions for G?. We 
claim first that 
G? = (-1)'-Y/2q, (3.12) 


Using this, we can derive our first expression for G?. 
First expression for G?: 


G? = GG?! = G(G?)"-Y? = G((-1) VP ge@-Y/? 


= G(-1)@-D/2-0-Y/2gP-D/2 = G(-1)(4-Y/20-Y/2 (2). 
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Second expression for G?: 
We claim that 


GP = (4) G. (3.13) 


Equating the two expressions gives 


G(—1)4-D/20-H/2 (2) 2 (2) CG. 
Pp q 


We can cancel G because it is not zero in F (since its square is (—1)(¢-))/24q, 
which is not zero in F); the Quadratic Reciprocity Law follows at once. 
The next step is to show Equation (3.13). By the Binomial Theorem, 


o- (Be) -EOe 


because (4) = (4). By the multiplicativity of the Legendre symbol (Corol- 
lary 3.14), the right-hand side is 


EQ 


j=1 


since (2) is +1. Now jp modulo q runs through 1,...,q—1 as 7 does, so the 
second expression for G? can be written as in Equation (3.13). 
The only tricky part of this proof is to evaluate G?. Expanding the product 


for G? gives 
q-l,. 4-l inf 
@=r (Zen (S)ct 
fai, Se pas 
noting that as k runs through 1,...,q— 1, so does —k modulo gq. 
By the multiplicativity of the Legendre symbol, (=) = (+) (£). 


Pulling the factor (+) out to the front and replacing k by jk in the second 


e= (4) 5 (2) (2) gow 
DD aa pans Se q 


By the multiplicativity of the Legendre symbol (4) () = (4), so 


sum gives 
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Next we add zero to both sides of this equation in a special form. On the 


right-hand side, add 
q-1 
k = 
r=, (=) ei. (3.14) 


k=1 
This expression is zero because half of the nonzero residues modulo q are 
squares, so half of the values of the symbol are 1 and the other half are —1. 


Thus 
alee 
a 
This double sum can be rearranged to give 


Ma a (RS 
= (=) 3 (<) ciI-K), (3.15) 
q my 


k=1 
By Euler’s criterion (Theorem 3.13), the term with k = 1 contributes 
(=) q = (-1)o-D/2q 
q 


to G?. 

We claim that all the other terms (those with k # 1) in Equation (3.15) 
contribute nothing. Assume that k 4 1, and write 7 = ¢!~*. Then 7 is a 
nontrivial qth root of 1. We claim that 


S=l+nt+---+n%* =0. 
To see this, notice that 


nS anty te byt tot altgte tnt =S, 


which shows that S = 0 since n £ 1. 


Apart from being a very beautiful result, the Quadratic Reciprocity Law 
is important in that it allows the Legendre symbol to be rapidly computed. 
This is useful in many areas, including primality testing (see, for example, 
Section 12.6). 


Example 3.18. Compute the Legendre symbol (24) using the Quadratic Reci- 
procity Law. First notice that 


91\ (7 13 \ _ 167 167\ | 6 11 

167)  \ 167) \167) | 7 Lees. arp day 
The problem has become more manageable and is readily finished by noting 
that 
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and 


It follows that (24) =-l. 


Exercise 3.16. Evaluate the Legendre symbols (34), (37), (43). 


3.4 Quadratic Rings 


It is tempting to conclude that we are now in a position to characterize those 
primes p that can be written in the form 


p=? +dy’, 


with z,y € Z for a given d € Z. Unfortunately, this problem is a little more 
complicated than it first appears. The methods of this chapter are applicable 
only if the ring Z[—d] is Euclidean, and this is not always the case. The 
structure of Z[/—d] is quite subtle, and some basic questions about these 
rings are still open. 


Exercise 3.17. Show that the Fundamental Theorem of Arithmetic does not 
hold in the ring Z[./—3] by considering the two factorizations 


2-2=4=(1+/-3)(1 — V-3) 
of 4. (Hint: Show that 2 cannot be a prime in this ring.) 
Example 3.19. Consider the equation 

x? + by? =p. 


In order to understand this, we expect to use the Quadratic Reciprocity Law 
to solve 
T?+5=0 (mod p) 


for T. 
Exercise 3.18. Show that eS = 1 if and only if p =1,3,7, or 9 modulo 20. 


In particular, the congruence T? + 5 = 0 modulo 7 has a solution: it is 
easily found that T= 3 is a solution. However, the equation 


a? + by* =7 


has no solution in integers. 
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Exercise 3.19. Show that the Fundamental Theorem of Arithmetic does not 
hold in the ring Z[/—5]. 


Exercise 3.20. Show that there are infinitely many rings Z[,/—d], where d is 
a positive square-free integer, in which the Fundamental Theorem of Arith- 
metic does not hold. 


The Quadratic Reciprocity Law is a useful tool for understanding when 
quadratic congruences have no solutions. For example, Exercise 3.18 shows 
that we will never obtain a solution to the equation 2? + 5y? = p if p is any 
prime that is congruent to 11 modulo 20. More than that, it can predict the 
existence of solutions when the equation cannot be checked easily by hand. 


Exercise 3.21. (a) Show that Z[,/2] is Euclidean with respect to the norm 
N(x + yv2) = x? — 2y?. 
(b) Show that if p is an odd prime, then the equation 


x” — 2y? =p 


has a solution whenever p = +1 modulo 8 but has no solutions when p = +3 
modulo 8. This is (by now) a routine use of the Quadratic Reciprocity Law 
together with the Euclidean property of Z[V2]. 


Exercise 3.22. When d > 1 is square-free, the ring Z|V dl has infinitely many 
units. Deduce that if 
x” — dy? =p 


has a solution in integers, then it has infinitely many solutions. (The first 
part of the exercise will be covered in the next section, but try to find a proof 
yourself. ) 


The statement in the first part of the exercise is not easy. The equation 
x? — dy? =1 


is often called Pell’s Equation after the seventeenth-century mathematician 
John Pell. This is now thought to be a misattribution. Brahmagupta seems 
to have known how to solve the equation long before Pell. In the twelfth cen- 
tury, Bhaskaracharya discovered the simplest of the infinitely many nontrivial 
solutions when d = 61, namely 


x = 1766319049, = y = 226153980. 


3.5 Units in Z[Vd],d > 0 75 


3.5 Units in Z[Vd],d > 0 


For d < 0, the ring R = Z[Vd] has only finitely many units, so we assume 
in this section that d > 0 is a fixed square-free integer. Write {t} for the 
fractional part of a real number t. 


Lemma 3.20. There are infinitely many coprime pairs of integers p and q > 0 
with i 
lad — p| < - 


ProorF. Let Q > 1 denote an integer. Divide the interval [0,1) into Q subin- 
tervals [0,1/Q), [1/Q,2/Q),... and consider the (Q + 1) numbers 
0, {vd}, {2Vd},..., {Qvd}. 


There are (Q +1) of them since Vd is irrational, so at least two must lie in a 
single one of the Q intervals of the form [a/Q, (a+ 1)/Q) by the pigeonhole 
principle Thus there must be integers qi, gz with 0 < qi < q2 < Q such that 


{avd} — {nva}] < 1/9. 


Unwinding the definition of the fractional part, this means that there are 
integers p; and pz with 


a2Vd-p2-avd+ pi| = (a — q)Vd— (po — pi)| < 1/Q. 


The proof is now finished by choosing @ > q = qg—q, > 0 and p= p2— pi. 


This was originally proved by Dirichlet and is the starting point for a deep 
subject known as Diophantine approximation. This subject has to do with 
how well an irrational number can be approximated by rational numbers. 


Exercise 3.23. Show that there is a constant C > 0 such that 
C 
- < lavd —"p| 


for all integers p and q > 0. 


Exercise 3.24. More generally, show that if a is algebraic of degree k > 1 
(that is, @ satisfies an irreducible polynomial of degree k with integer coeffi- 
cients), then there is a constant C(a) > 0 such that 


C(a 
a < |qa — p| 


for all integers p and q > 0. 


76 3 Quadratic Diophantine Equations 
Theorem 3.21. [fd > 1 is a square-free integer, then 
x? — dy? =1 


has infinitely many solutions in integers (x,y). Moreover, each solution cor- 
responds to a unit in R= Z[Vd] with norm 1. Any unit with norm 1 has the 
form tu” forn € Z, where u is a fixed unit with norm 1. 


PROOF. Using Lemma 3.20, choose p,q > 0 with 
lavd — p| < -. (3.16) 


Then 1 i 
p-—<qvd<pt-, 
qd qd 
so ; 
lgqVd + pl < 2qVd+ 7. (3.17) 
Multiplying the inequalities (3.16) and (3.17) shows that 
|p? — dq?| < 14 2V4d. 


We would like to show that the left-hand side is 1 for infinitely many 
pairs (p,q). We cannot deduce this at once, but notice that the right-hand 
side is a uniform bound (independent of p and q), so there must be an inte- 
ger e with 

l<e<1+2Vvd 


such that for infinitely many pairs p and gq, 
p’ — dq’? =e. 


There also must be infinitely many distinct pairs (p,q) and (p’,q’) such 
that p = p’ modulo e, gq = q' modulo e, and 


/ 


pp’ — dqq' =p’ — dq’? =0 (mode). 


Given such a distinct pair, write 
pp’ — dqq’ = xe and pq’ — q'p = ye 


for integers x and y. Then 


2 — aye = (Pa (tay 
e€ e€ 
1 
= = (p?(p? — dq”) — dq?(p? — dq’)) 


1 (p? ~ dg?) = 


a | 
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so there are infinitely many solutions. 
To prove the claim about the structure of the unit group, consider the map 


L:U(R).> R? 


defined by 
L(a + yVd) = (log(x + yVd), log(x — yVd)). 


The image is a nontrivial discrete subgroup (see Exercise 3.25 below) of R?, 
so it must have rank 1 or 2 by Exercise 3.26. 
On the other hand, x? — dy? = 1 implies that 


log(« — yVd) + log(« + yd) = 0, 


so the image of L lies in a one-dimensional subspace of R? and therefore 
the rank must be 1. This is enough to prove the claim: The image set must 
be {n(v, —v) | n € Z} for some nonzero v € R and the claim follows with u 
satisfying L(w) = (v,—v). 


Exercise 3.25. Explain why the image of L is a discrete subgroup of R? in 
the proof of Theorem 3.21. 


Exercise 3.26. Prove that a discrete subgroup of R” has rank less than or 
equal to n. 


Finding u is in general a nontrivial problem. In some books you will see the 
method of continued fractions used, which does give an algorithm. The method 
used here, which is a first step into the subject called geometry of numbers, 
was chosen for two reasons. First, using a generalization of this argument, one 
can go on to analyze the units of the ring of algebraic integers (see p. 84) inside 
a number field. This always turns out to be finitely generated with a rank that 
is easily computed from basic data about the number field. The method using 
continued fractions does not generalize. Second, the geometry of numbers, 
when worked out fully, really represents an application of topological ideas. If 
you ask what kind of shapes in space must contain lattice points, you quickly 
find yourself resorting to ideas such as compactness and connectedness as well 
as convexity. 

A beautiful fact about the solutions of the equation in Theorem 3.21 is 
that they form a group. Moreover, the multiplication law on elements at+yVd 
can be expressed in terms of polynomial functions on the coordinates x and y 
as follows: 


(x1 + Vd) (x2 + yoVd) = (wyx2 + dyry2) + (iyo + vay) Vd. 


In Chapter 5, we will encounter a whole family of Diophantine equations in 
two variables whose solutions form groups, and for which the multiplication 
law can be expressed in terms of rational functions on the coordinates. 
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3.6 Quadratic Forms 


The subject of quadratic forms is a large one, and we will merely introduce it 
here via a classical proof of a result due to Lagrange and some comment on 
Gauss’ Theorem. Consider the Diophantine equation 


ax* + bry +cy? =n (3.18) 


in which we seek an integral solution (x,y) for given a,b,c,n € Z. The dis- 
criminant A of the quadratic form ax? + bry + cy? is defined to be 


A = 0? —4ac. 


Just as with the Pythagorean equation, there are several elementary reduc- 
tions to be made. 

First, if gcd(a, b,c) = d > 1, then d must also divide n, and Equation (3.18) 
becomes 


(a/d)x* + (b/d)ay + (c/d)y* = (n/d), 


so without loss of generality we may assume that gcd(a, b,c) = 1. 
Second, if gcd(a, y) = e > 1, then 


a(x/e)* + 0(x/e)(y/e) + e(y/e)” = (n/e?), 


so we may assume without loss of generality that x and y are coprime. As in 
the Pythagorean case, call solutions (2, y) with gcd(x, y) = 1 primitive. 
Third, if the discriminant A is a square, then the equation 


at? + bt +c=0 


has rational solutions that may be written u1/v1 and u2/v2 in lowest terms, 
with v; and v2 positive, so Equation (3.18) may be written as 


a(v,2@ — uy) (vex — ugy) = nvjv2. 


This is not really a quadratic equation, but a pair of linear ones. For each 
integral pair (r,s) with ars = nv1v2, solve the equations 
UjyeX— UY =T, 


V2% — U2Y = S. 


Integral solutions to this pair of equations — if there are any — solve Equa- 
tion (3.18). 


Exercise 3.27. Let p= . Show that is an integer. 


(41) eae 
2 4 
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Theorem 3.22. [LAGRANGE] Let A be a nonsquare integer. Then there is a 
quadratic form ax? + bay + cy? of discriminant A with a primitive solution to 


ax? + bay + cy? =n 


if and only if the congruence 


A— 
2? + pz ( ‘) =0 (mod n) (3.19) 
has a solution z. 


ProorF. Assume that (a, 3) is a primitive integral solution to Equation (3.18). 
By Theorem 1.23, there are integers 7,6 with 


ay+ 86 =1. 
Let 
sll] be 
y (cane me 
Notice that det i | = 1, so this matrix is an invertible transformation 
on Z?. 
Let 4 
r= aad +cBo+ “oe 
and 


s = ad? + bby + c7. 


Notice that by our choice of p, both r and s are integers. Now express Equa- 
tion (3.18) in the variables X and Y to obtain 


a(aX — bY)? + (aX — 6Y)(BX —yY)+c(BX +7Y)? 
= X?(aa* + baB+cB”) + XY (2aa6d — b(ay + G6) +2cBy) + Y¥?(ad? + b5y4+ cy”) 
= nX* + (r+ p)XY + sY? =n. 


The equation 
nX? 4+ (2r+p)XY+sY° =n (3.20) 


has the solution X = 1, Y = 0, corresponding to 


B B >| lol 
The discriminant of Equation (3.20) is 


(2r + p)? —4sn = A, (3.21) 
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A 
r? + pr — (==) = sn, 


showing that r is a solution of the congruence (3.19). 

Conversely, assume that r is a solution to the congruence (3.19). Then 
solving Equation (3.21) gives an integer s and hence the integer solution X = 
1, Y = 0 to Equation (3.20). Changing back to the variables zx, y using 


= oa] 


gives an integral solution to the equation 


so 


nx? + (2r4+ p)ayts? =n 


that has discriminant A. 


Example 8.23. Let a = 1,b = 0,c = 5, and n = 7, so p = 0 and A = —20. 
Theorem 3.22 applies to say that there is a quadratic form representing 7 with 
discriminant —20 if and only if 


z*+5=0 (mod 7) 


has a solution. We know that —5 is a quadratic residue modulo 7, so there is 
such a form. The proof constructs the form 


Tx? + 6ry + 2y?, 
and of course this represents 7 when « = 1 and y = 0. 


Exercise 3.28. Prove that any odd prime congruent to 1 modulo 4 is a sum 
of two integer squares using Theorem 3.22 (cf. Theorem 2.6 where this was 
proved using different methods). 


The next exercises explore the change of variables (X,Y) to (a,y) used 
in the proof of Theorem 3.22. An integer n is said to be represented by an 
integral quadratic form Q if there are integers x and y with Q(z, y) =n. 


Exercise 3.29. Let 
P(a,y) = ax? + bry + cy” 


and 
QO(z,y) = AX? +BXY +CY? 


be binary quadratic forms with integer coefficients. Say that P and Q are 
equivalent, written P ~ Q, if there is an integral change of variables 


A -G1E 
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a—é 


with det | By 


= 1 such that 


P(z,y) = Q(X,Y). 


(a) Show that ~ is an equivalence relation. 

(b) Show that equivalent quadratic forms have the same discriminant. 

(c) Show that equivalent quadratic forms represent the same set of integers: 
If P~ Q, then 


{P(2,y) | #,y € Z} = {Q(a,y) | ey € Z}. 


Exercise 3.30. Show that a prime number p is represented by a quadratic 
form P if and only if there is a quadratic form equivalent to P of the form 


px? + dry + ey” 
for integers d and e. 


Let P(x,y) = ax? + bry + cy? be a quadratic form. Then P is positive- 
definite if P(x, y) > 0 for all x and y and is reduced if either 


c>aand -—a<b<a 


or 
c=aand0<b<a. 


Exercise 3.31. Prove that a positive-definite binary quadratic form is equiv- 
alent to a unique reduced quadratic form. 


Exercise 3.32. The class number of d is the number of equivalence classes 
of positive-definite forms with discriminant d. Prove that the class number is 
finite for any d. 


NOTES TO CHAPTER 3: Artin’s conjecture from Section 3.1 is still open — see the 
monograph [58, Section 3.2, 3.3] by Everest, van der Poorten, Shparlinski and Ward 
for descriptions of what is known and references to the literature. Shoup’s result 
can be found in his paper [138]. There is a discussion of the history of the Chinese 
Remainder Theorem in many places; see the ‘History of Mathematics’ Web site [113] 
for references. Mahler’s paper [102] gives an account of the method actually used by 
the early Chinese mathematicians, as opposed to the modern approach which follows 
Gauss [67]. We thank Robin Chapman for Exercise 3.12(b). Gauss was justly proud 
of having proved the Quadratic Reciprocity Law and many mathematicians have 
seen it since as foundational in the modern theory of numbers. The history and 
mathematics of the Quadratic Reciprocity Law and the development of reciprocity 
laws for higher degrees are described in Lemmermeyer’s monograph [98]. 


A 


Recovering the Fundamental Theorem of 
Arithmetic 


This short chapter will explain how ideal theory was developed as a means 
of recovering from the failure of the Fundamental Theorem of Arithmetic 
witnessed in Chapter 3. We begin with a few historical remarks to set that 
development in context and go on to give a reasonably complete account of 
unique factorization of ideals in the ring of algebraic integers in a quadratic 
field. Finally we introduce the class number and the class group. 


4.1 Crisis 


The attempt to understand fully the problem we set out to study in the 
last chapter exposed a phenomenon that represented something of a histor- 
ical crisis. During the nineteenth century, mathematicians had to come to 
terms with the breakdown of the Fundamental Theorem of Arithmetic. In 
March 1847, Lamé announced a proof of Fermat’s Last Theorem (described in 
Section 2.5.1) to the Paris Academy, assuming (wrongly) that the Fundamen- 
tal Theorem of Arithmetic held in the ring Z[e?™'/"] for every n > 1. Lamé 
acknowledged that Liouville originally suggested this approach to Fermat’s 
Last Theorem, but Liouville himself addressed the meeting and suggested 
that there might be a problem with the assumption of unique factorization 
into primes. 

The question raised was this: Does unique factorization into primes hold 
in the ring Z[e?7/"]? This problem became a focal point for rapid devel- 
opments. On May 24th 1847, Liouville presented a letter from Kummer to 
the Academy that settled the arguments. Kummer had proved in 1844 that 
unique factorization failed in general but that his “ideal complex numbers” 
in a paper of 1846 allowed a form of unique factorization to be recovered. By 
September 1847, Kummer had presented a paper to the Berlin Academy in 
which he proved that for p a regular prime! Fermat’s Last Theorem holds for 


' A prime p is called regular if p does not divide the numerators of any of the 
Bernoulli numbers Bo, Ba,..., Bp—3; the Bernoulli numbers are defined on p. 203. 
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exponent n = p, essentially by Lamé’s method. In this paper, Kummer also 
showed that 37 is not regular since 37 divides the numerator of B32. Thus 
Kummer proved Fermat’s Last Theorem for many indices and showed that 
Lamé’s approach failed for others. 

These dramatic developments did not lead to a proof of Fermat’s Last 
Theorem but contributed to algebraic number theory in a profound way by 
eventually leading to the result that rings such as Z[/—5] do have a kind of 
Fundamental Theorem of Arithmetic — but at the level of ideals rather than 
elements. 


4.2 An Ideal Solution 


Definition 4.1. An ideal in a commutative ring R is a subgroup of the addi- 
tive group of R that is closed under multiplication by elements of R. 


It is easy to construct ideals in a commutative ring: Take all the multiples 
(a) =aR= {ar|re R} 


of a single element a. In rings such as Z and Z[,/—2], all ideals are of this 
form, and this is true for any Euclidean ring. 


Exercise 4.1. (a) Using the Euclidean Algorithm, prove that any ideal in Z 
has the form (k) = kZ for some k € Z. 

(b) More generally, prove that in a Euclidean ring R any ideal has the 
form (k) = kR, the multiples of a single element k. 


Such singly-generated ideals are called principal, and any ring in which all 
ideals are principal is called a principal ideal domain. 

The statement in Exercise 4.1(b) is not true in all commutative rings. 
It is difficult to envisage what ideals look like in general; however, a more 
sophisticated version of the Fundamental Theorem of Arithmetic makes them 
easier to understand. This is described in Section 4.3 for quadratic fields. 

Any field K containing the rationals contains a ring Ox of algebraic inte- 
gers; this ring is a generalization of the usual integers in the rationals, and 
is defined to be the set of all zeros in the field K of monic polynomials with 
coefficients in Z. 


Exercise 4.2. Show that the ring of algebraic integers in Q(Vd) is Z[Vd] 
if d = 2 or 3 modulo 4 and is Z[(1 + Vd)/2] if d = 1 modulo 4. (Hint: Start 
by showing that any algebraic integer in Q(Vd) that is not in Z must satisfy 
a quadratic equation.) 


Exercise 4.3. By the previous exercise, the ring of algebraic integers in Q(/6) 
(or Q(V/14)) is Z[V6] (resp. Z[V14]). Prove that the ring of algebraic integers 
in Q(V6, V14) is strictly larger than Z[V6, 14]. 


4.3 Fundamental Theorem of Arithmetic for Ideals 85 


Exercise 4.4. Adapt the methods of Theorem 3.21 to show that the group 
of units O% inside the ring of algebraic integers Ox of the field K = Q(Vd) 
when d > 0 is square-free comprises {tu” | n € Z}, where u > 1 is some unit 
of Og. Such an element uw is called a fundamental unit. 


Exercise 4.5. Find fundamental units for the real quadratic fields 


Q(v2), Q(v3), Q(v5), and Q(Vv7). 


Any element of Q(Vd) for a square-free integer d may be written uniquely 
in the form a = x + yVd with x and y rational. This presents Q(Vd) as a 
two-dimensional vector space over Q with basis {1, Vd}. 


Definition 4.2. The norm of a= x+yVd in Q(Vd) is defined to be 
N(a) = 2? — dy’ 


and the trace 
T(a@) = 22. 


Exercise 4.6. The map 3 +> af on Q(V4d) is a Q-linear map on the Q vector 
space Q(V/d). Find the 2 x 2 matrix determined by this map, and show that 
the absolute value of its determinant is |N(a)| and its trace is T(a). 


Unique factorization will be recovered in Section 4.3 by working with prime 
ideals in the algebraic integers. These matters represent the beginnings of 
an important subject called algebraic number theory. The recovery of the 
Fundamental Theorem of Arithmetic at the level of ideals represents a major 
achievement that continues to influence the development of number theory 
and geometry. 

Theorem 2.14 on p. 55 gives a different way to recover the Fundamental 
Theorem of Arithmetic, used to dramatic effect in Theorem 2.13, but the 
development of ideal theory proved to be of much greater importance. 


4.3 Fundamental Theorem of Arithmetic for Ideals 


We begin with a natural definition of multiplication on ideals. Subsequently, 
we introduce a notion of prime ideal, then we go on to show that every non- 
trivial ideal factorizes as a product of prime ideals in a way that is unique. 


Definition 4.3. Let I and J denote ideals in a commutative ring. The sum 
and product of I and J are defined by I+ J={at+b|aeI,be J}, while IJ 
is the additive subgroup generated by the set {ab| ac I,beE J}. 


Exercise 4.7. If J and J denote ideals in a commutative ring, prove that the 
sum and the product of J and J are also ideals. 
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Sums and products of more than two ideals are defined in an entirely 
analogous fashion and again turn out to be ideals. 
It might have seemed more natural to define IJ to be the set 


{ab|aeI,b€ J} 


rather than the subgroup this set generates, however the set of products by 
itself is not always closed under addition. 


Exercise 4.8. Give an example of a commutative ring R together with two 
ideals J and J such that the set {ab | a € I,b € J} is not an ideal. 


Ifa=x2+yVd € K = Q(Vd), write a* = « — yVd for the conjugate of a. 
If d < 0, then this is the usual complex conjugate; for d > 0 this terminology 
comes from Galois theory. For an ideal J, write J* for the set of conjugates 
ofael. 


Exercise 4.9. Let J and J denote ideals in Ox. 
(a) Show that J* is an ideal in Ox. 

(b) Show that (I+ J)* = I* + J*. 

(c) Show that (IJ)* = I*J*. 


If a,,...,@,% are elements of Ox, write 


(Q1,-.-,QK) 


for the ideal 
(ay) +--+ + (az) = ayOx+---+apOx 


generated by aj,...,a- Also define 
(a4,...,0~) =a Z+---+ apZ 


for the additive subgroup of Ox generated by a1,...,ax%. It is important to 
distinguish these different types of generation. 

In what follows, we are going to work with the full ring of algebraic integers 
in the field Q(Vd) for a square-free integer d. Following Exercise 4.2, define 6 
to be (1 + Vd)/2 if d = 1 modulo 4 and Vd if d = 2 or 3 modulo 4. Thus, 
if K = Q(V4d), then Ox = Z[6]. Ideals in Ox, although not always principal, 
can always be generated as ideals by two elements. 


Theorem 4.4. Let I denote an ideal in Ox. Then there are elements a, 3 in I 
with I = (a, f). 


PRoorF. Since Ox as an additive group is a subgroup of Q?, it follows that I 
can be generated as an additive group by two elements. We will first show 
that one of these elements can be chosen to lie in Z. Let 


B={beZ|a+bd € I for some ac Z}; 
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then B is an ideal of Z. Hence B = gZ for some g € Z and similarly [NZ = hZ 
for some h € Z. Since g € B, there must be c € Z with c+ gd € I. 
We claim that 
I = (c+ g6,h). (4.1) 


Clearly (c + gd,h) C I. Now assume that a+ bd € I with a,b € Z. 
Since b € B, b= eg for some e € Z. Therefore 
a—ec=a+ bd —e(c+ gd). 
This is an element of IM Z, so it can be written as fh for some f € Z. Then 
a+6b6=a—ec+e(c+ gd) = fht+e(c+g6) € (c+ 96, h), 


showing Equation (4.1). 

To finish the proof of the theorem, use Equation (4.1) to write a = c+ hd 
and @ = g. Then a,8 € I, so (a,8) C I. Conversely, if y € J, then for 
integers m and n, 

y=ma+nZ, 


so I C (a, 3), which concludes the proof. 


As a final step toward proving the Fundamental Theorem of Arithmetic 
for ideals in Ox, we note the following lemma. 


Lemma 4.5. [HURWITz’S LEMMA] If a,6 are elements of Ox and k € Z 
divides N(a), N(G), and T(a{*), then k divides a8* and a*G in Ox. 


Exercise 4.10. Prove Hurwitz’s Lemma. (Hint: This only uses simple prop- 
erties of the norm and trace functions.) 


Corollary 4.6. Let I denote any ideal of Ox. Then II* is a principal ideal kZ 
of Z. 


PROOF. We know that I = (a, 3) for some a, 3, so 
IT” = (a, B)(a*, B") = (aa*, a6", Ba*, BB"). 


This means that IJ* contains the integers N(a) = aa* and N((), as well 
as T(aB*) = aB* + a*G. If k denotes the greatest common divisor of these 
integers then k € II*, so (k) C II*. Now k|N(a), k|N(8), and k|T(ap*) and 
hence, by Hurwitz’s Lemma, k|a3* and k| Bor, so II* C (k) as claimed. 


The integer k appearing in Corollary 4.6 may be taken as positive without 
loss of generality since kZ = —kZ. 


Definition 4.7. If I denotes any ideal of Ox, then the unique integer k > 0 
with II* = kZ is called the norm of I, written N(1). 
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Corollary 4.8. (1) For I = (a, 8), 
N(2) = ged(N(a), N(8), T(aB")). 


(2) If I = (a) is a principal ideal, then N(I) = N(a). 
(3) The norm is multiplicative: N(IJ) = N(I)N(J) for all ideals I and J. 
(4) N(1) = [Ox : I], the group-theoretic index of I as a subgroup of Ox. 


vU 
wD 


oOoF.(1) This appeared in the proof of Corollary 4.6. 
(2) This follows because (a)(a*) = (aa*). 
(3) (NUD) = LIP" = (IPT) = (N(D)(N(D) = (N(DN()). 


Exercise 4.11. Prove Corollary 4.8(4). (Hint: if (h,c+ gd) is a nonzero ideal 
of Ox, then N(I) = gh.) 


Corollary 4.9. If I,J and K are ideals of Ox with I 4 {0} and IJ = IK, 
then J=K. 


PROOF. This is obvious if J = (a) is principal because in that case IJ = aJ 
so J = a 4(IJ). Similarly, K = a-1(1K) = a-1(IJ) = J. In general, the 
identity JJ = 1K implies that 


(II*) J = (IJ)I* = (IK)I* = (II")K, 


and the result follows as before. 


This important ‘cancellation’ property of ideals in Ox will play a key role 
in the proof of the Fundamental Theorem of Arithmetic for ideals. 


Definition 4.10. If I and J are ideals in Ox, we write ra (I divides J) if 
there is an ideal K in Ox with J= IK. 


Notice that IK C I, so if I|J then J C I. 
Lemma 4.11. Given two ideals I and J in Ox, IJ if and only if J CI. 
PROOF. One direction is already proved, so assume that J C I. Then 


JI” CI =(N(D), 


so i 
= —— [* 
K Nd)? 
is an ideal contained in Ox. It follows that 
1 1 
IK = —~I(JI*) = ——~J(II*) = —~ J(N(f)) = J. 
wl) = WHI) = HHI) = 4 


and hence I | J as claimed. 


In what follows, we see a real duplication of ideas from Chapter 1, worked 
out in the context of ideals. The interchangeability of inclusion and divisibility 
for ideals will be used repeatedly. 
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Definition 4.12. A nonzero ideal I 4 R in a commutative ring R is called 
maximal if for any ideal J, ues implies that J = I. An ideal P is prime 
of PILE implies that P\l or Ply. 


Exercise 4.12. In a commutative ring R, let M and P denote ideals. 

(a) Show that M is maximal if and only if the quotient ring R/M is a field. 
(b) Show that P is prime if and only if R/P is an integral domain (that is, 
in R/P the equation ab = 0 forces either a or 6 to be 0). 

(c) Deduce that every maximal ideal is prime. 


Theorem 4.13. [FUNDAMENTAL THEOREM OF ARITHMETIC FOR IDEALS] 
Any nonzero proper ideal in Ox can be written as a product of prime ideals, 
and that factorization is unique up to order. 


PrRooF. If J is not maximal, it can be written as a product of two nontrivial 
ideals. Comparing norms shows these ideals must have norms smaller than I. 
Keep going: The sequence of norms is descending, so it must terminate, re- 
sulting in a finite factorization of I. By Exercise 4.12, every maximal ideal is 
prime, so all that remains is to demonstrate that the resulting factorization is 
unique. This uniqueness follows from Corollary 4.9, which allows cancellation 
of nonzero ideals common to two products. 


4.4 The Ideal Class Group 


In this section, we are going to see how the nineteenth-century mathematicians 
interpreted Exercise 3.32 on p. 81 in terms of quadratic fields. The major result 
we will present is that ideals in Ox, for a quadratic field K, can be described 
using a finite list of representatives [,,...,J,; any nontrivial ideal J can be 
written J;P, where 1 <i < hand P isa principal ideal. Thus h, known as the 
class number, measures the extent to which Ox fails to be a principal ideal 
domain. This statement was proved for arbitrary algebraic number fields and 
proved to be influential in the way number theory developed in the twentieth 
century. 
Given two ideals J and J in Ox, define a relation ~ by 


I~ J if and only if J = AJ for some \ € K*. 
Exercise 4.13. Show that ~ is an equivalence relation. 
We are going to outline a proof of the following important theorem. 


Theorem 4.14. There are only finitely many equivalence classes of ideals 
in Ox under ~. 
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One class is easy to spot — namely the one consisting of all principal ideals. 
Of course, Ox is a principal ideal domain if and only if there is only one class 
under the relation. One can define a multiplication on classes: If [I] denotes 
the class containing J, then one can show that the multiplication defined by 


[L7] = [7] (4.2) 
is independent of the representatives chosen. 
Corollary 4.15. The set of classes under ~ forms a finite Abelian group. 


The group in Corollary 4.15 is known as the ideal class group of K (or just 
the class group). 
PROOF OF COROLLARY 4.15. In the class group, associativity of multiplica- 
tion is inherited from Ox. The element [Ox] acts as the identity. Finally, given 
any nonzero ideal I, the relation II* = (N(JI)) shows that the inverse of the 
class [I] is [J*]. 


Lemma 4.16. Given a square-free integer d #1, there is a constant Cq that 
depends upon d only such that for any nonzero ideal I of Ox, K = Q(v ad), 
there is a nonzero element a € I with |N(a)| < CaN(J). 


Exercise 4.14. *Prove Lemma 4.16. The basic idea is a technique similar to 
that used in the proof of Theorem 3.21 showing that a lattice point must 
exist in a region constrained by various inequalities. Since the original proof, 
considerable efforts have gone into decreasing the constant Cq for practical ap- 
plication. The best techniques use the geometry of numbers, a theory initiated 
by Minkowski. 


PROOF OF THEOREM 4.14. First show that every class contains an ideal 
whose norm is bounded by Cg. Given a class [I], apply Lemma 4.16 with I* 
replacing I. Now (a) C I*, so we can write (a) = I*J for some ideal J. 


However, this gives a relation [J*][J] = [(a)] in the class group. This means 
that [J] is the inverse of [I*]. However, we remarked earlier that [J] and [J*] 
are mutual inverses in the class group. Hence [I] = [J]. Now 


|N(a)| = N((a)) = NU“) N(J). 


Since the left-hand side is bounded by CaN(I*), we can cancel N(I*) to 
obtain N(J) < Ca. 

Now the theorem follows easily: For any given integer k > 0, there are only 
finitely many ideals of norm k; this is because any ideal must be a product 
of prime ideals of norm p or p”, where p runs through the prime factors of k. 
There are only finitely many such prime ideals and hence there are only finitely 
many ideals of norm k. Now apply this to the integers k < Cq to deduce that 
there are only finitely many ideals of norm bounded by Cg. Since each class 
contains an ideal whose norm is thus bounded, by the first part of the proof, 
it follows that there are only finitely many classes. 
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Exercise 4.15. Investigate the relationship between quadratic forms and ide- 
als in quadratic fields. In particular, show that Exercise 3.32 on p. 81 is equiv- 
alent to Theorem 4.14. (Hint: If J denotes an ideal with basis {a, 3}, show 
that for x,y € Z, N(wa+y)/N(J) is a (binary) integral quadratic form. How 
does a change of basis for J relate to the form? What effect does multiplying I 
by a principal ideal have on the form?) 


4.4.1 Prime Ideals 


To better understand prime ideals, we close with an exercise that links up the 
various trains of thought in this chapter and shows that ideal theory better 
explains the various phenomena encountered in Chapter 3. 


Exercise 4.16. Factorize the ideal (6) into prime ideals in Z[./—5], expressing 
each prime factor in the form (a,b + ce/—5). 


Exercise 4.17. Let Ox denote the ring of algebraic integers in the quadratic 
field K = Q(Vd) for a square-free integer d. 

(a) If P is a prime ideal in Ox, show that P| (p) for some integer prime p € Z. 
(b) Show that there are only three possibilities for the factorization of the 
ideal (p) in Ox: 


(p) = P,P2 where P, and Py» are prime ideals in Ox (p splits); 
(p) = P, where P is a prime ideal in Ox (p is inert); 
(p) = P?, where P is a prime ideal in Ox (p is ramified). 


This should be compared with the possible primes in Z[?] described in The- 
orem 2.8(3). The following exercise gives a complete description of splitting 
types in terms of the Legendre symbol. 


Exercise 4.18. Let Ox denote the ring of algebraic integers in the quadratic 
field K = Q(V4d) for a square-free integer d. Let D = d if d = 1 modulo 4 
and let D = 4d otherwise. Show that an odd prime p is inert, ramified, or 


split as the Legendre symbol (2) is —1, 0, or +1, respectively. What are the 
possibilities when p = 2? 


We should say something about the terminology. Splitting and inertia are 
fairly obvious, the latter signifying that the prime p remains prime in this 
bigger ring, just as primes p = 3 modulo 4 remain primes in Z/i]. The term 
“ramify” means literally to branch, and we see here something of an overlap 
with the theory of functions. A function such as y = ,/@ really consists of 
two possible branches. This notion was borrowed deliberately to name the 
phenomenon seen in number theory, where a prime in Z becomes a power of 
a prime in a larger ring. We end this chapter with a definition because it is 
going to appear again in Chapter 11. 
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Definition 4.17. Let K = Q(Vd) denote a quadratic field, where d is a 
square-free integer. Define D by 


(4.3) 


D= d ifd=1 modulo 4 and 
~ | 4d otherwise. 


Then D is called the discriminant of the quadratic field K = Q(Vd). 


NOTES TO CHAPTER 4: Much of this chapter was based on Robin Chapman’s ex- 
cellent expository notes. To see the details worked out economically in the general 
case, consult Lang’s book [96]. Lemma 4.16 is proved as Theorem 4 on p.119 of that 
book; Chapter V is an excellent introduction to Minkowski’s geometry of numbers. 
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Elliptic Curves 


One of the many powerful ideas that have been brought to bear on problems in 
number theory is a connection between Diophantine problems and geometry. 
Exercise 2.2 on p. 45 gave a hint of this phenomenon; the geometric structure 
in that case was a unit circle, an object with algebraic structure in that the 
points of the unit circle form a group. In this chapter, we introduce a family 
of curves with a group structure. The main aim is to develop a working un- 
derstanding of the group operation and to illustrate this with many examples. 
In subsequent chapters we will make these ideas more rigorous. 


5.1 Rational Points 


Having studied the Pythagorean equation, which has infinitely many integral 
solutions, perhaps the existence of so few integral solutions to the equation! 


y =a? -2 


seems a little disappointing. However, the solution (3,5) has an amazing prop- 
erty. We can use this one integral solution to generate other exotic rational 


solutions to the equation. It may not seem obvious, but we can use this so- 
. : . 129 383 7 = S 
lution to generate the solution (2, 8s). Moreover, in a precise sense, this 


rational solution is the next simplest solution to the equation. 


! The equation y? = 2? +C is sometimes called Bachet’s equation after Claude 
Bachet (1581-1638). Bachet is most famous for translating the Arithmetica of 
Diophantus from Greek into Latin. This is the book in which Fermat wrote his 
famous marginal note asserting what is now called Fermat’s Last Theorem. In 
addition, Bachet discovered the duplication formula for this curve, showing that 
if (a, y) is a solution, then 


((e* — 8Cx)/(2y)*, (2° — 2002° + 8C?)/(2y)*) 


is also a — potentially different — solution. 
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To see how this is done, first construct the tangent to the curve at the 
point P = (3,5). This has equation 


If we substitute the equation for this line into the equation of the curve, then 
(we claim) the line will meet the curve at another point, and this point will 
have rational coordinates.2 To see this more explicitly, note that when we 
substitute, we get a cubic equation for z, 


We claim that x = 3 is a double root of this equation. Clearly, it is a single root 
by substituting in and getting 25 on both sides of the equation; differentiating 
and substituting shows it is a double root because you get 27 on both sides. 

To find the third point of intersection, use the sum of roots formula. For 
a cubic, this says that if 1,22, and x3 are the three zeros of the cubic 


x? + ax? + br +c, 


then 2; + 22 + 23 = —a (see Exercise 5.11 on p. 105). Applying this, and 
letting x denote the third root, we see that 


aT \: 


Solving this for x gives x = i. To find y, use the equation of the tangent to 
see that y= 33. 


It is tempting to try this again. We cannot expect anything by joining our 
new point back to P. However, we could join the other integral solution (3, —5) 
to the new point to see where the line meets the curve again. Technically, it 
is better to reflect the new point in the x-axis and try to join that to our first 
point (for reasons that will become apparent later). Thus we define P; = (3,5) 
and P2 to be (3, — 33). Recursively define P,, to be the reflection in the x- 
axis of the third point of intersection of the line joining P to P,_1. The next 


point is 


(164323 66234835 
3 \ 99241’ 5000211 /” 


from which we obtain 


_ ( 2340922881 113259286337279 
*"~ (58675600 ° 449455096000 


? If you are comfortable with geometrical notions, then you will accept that since P 
is already a double point, the third point must be rational. 
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It is an amazing fact that there are infinitely many rational points on this 
curve and (up to reflection in the z-axis) they can all be constructed in this 
way, starting with P. 

This example already exhibits some typical behavior; for example, the 
denominator of the x-coordinate of P3 is a square while the denominator of 
the y-coordinate is the cube of the same number: 


( 164323 =) 
3 = 


1712 ? 1713 


Exercise 5.1. Prove that any rational point on the curve y? = «° — 2 must 
have the shape P = (A/B?,C/B?) for coprime integers A, B,C. 


Example 5.1. To start to understand what is going on in the geometrical iter- 
ation that produces the points P,,, consider the sequence (B,,), where B,, is 
the square root of the denominator of the x-coordinate of P,. The first few 
values are shown in Table 5.1. 


Table 5.1. Growth in the values of Bn. 


n B, 
1 1 
2 10 
3 171 
4 7660 
5 12660211 
6 22652313570 
7 58809175344521 
8 1735132266687114280 
9 357172782187144055262201 
10 11545534325 1682907198856192050 
11 30298854203539385536028167296302051 
12 689991490842950483313935163766440646064580 
13 22743339816243727151383520741637996456735801712571 
14 1301982234059157037070228212465238100265563723924858470330 
15 | 45687890972429224342713610900040552323182688307706080693278173039 


The lengths of the numbers B,, written out in decimal digits seem to grow 
quadratically in n. The number of digits in B, is approximately log, By, so 
this suggests a relationship between log B, and n?. The following beautiful 
result makes this precise. 


Theorem 5.2. There is a constant h > 0 for which 
1 
—5 log Bn > h as n — co 
n 


where (B,,) is the sequence in Table 5.1. 
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We are not going to prove this result. An easy consequence of Theorem 5.2 
is the finiteness of the number of integral points in the sequence (P,,). Later, 
we will prove that the maximum of log|A,| and 2 log |B,,| also grows as in the 
statement of Theorem 5.2 (see the comments after Theorem 7.13 on p. 147.) 
In Section 7.4.1, Theorem 7.15, we will relate the growth rates of log |A,.| 
and 2log|B,,| to each other. 

The geometrical operation taking P,, to P,+1, described above is a spe- 
cial case of a more general one: There is a binary operation on the set of 
points (x,y) satisfying the equation y? = x? — 2 that behaves like a group 
law. (At this point there is no indication of an identity.) Indeed, we can de- 
fine such an operation on the set of points satisfying any equation of the 
form y? = x3 +ax*+bx+c under the nondegeneracy condition of no repeated 
zeros used in Siegel’s Theorem (Theorem 2.13 on p. 54). 

Let E denote the set of points (x,y) with y? = 2° + ax? + br + c, assume 
that the cubic has no repeated zeros, and define a binary operation + on the 
curve FE as follows. If P and Q are points on E, then the line through P and Q 
meets & in exactly one further point, say (x,y). The reflection R = (x, —y) 
of (x,y) in the z-axis is then defined to be P + Q (see Figure 5.1). The 
case P = Q requires a notion of tangency (which can be defined for curves 
over any field, using order of vanishing), and then 2P is obtained by reflecting 
the unique other point of intersection of the line tangent to the curve at P in 
the z-axis. The tangent is well-defined by the nondegeneracy condition. 


Exercise 5.2. Draw the curve y? = «7(x+1). Show that the tangent at (1,0) 
is not well-defined. 


Theorem 5.3. The set E with binary operation + forms an Abelian group 
after adding one point “at infinity.” 


A natural question is to ask what the identity of the group is, and this 
will be fully resolved — and the theorem proved — in the next chapter. At this 
stage, we have to confess that the identity element does not appear to exist 
—it is the point ‘at infinity’. For now, we can think of this as a formal single 
point added to the plane with the property that it lies on any vertical line of 
the form « = constant. We will give more justification for this claim and will 
return to the question when we have described a fascinating class of functions 
that lie behind the theory of elliptic curves — see Chapter 6. The identity 
element is the point at infinity, which in Figure 5.1 may be thought of as being 
reached by moving infinitely far up (or down) the y-axis. Additive inverses 
are given by reflection in the z-axis, so if P = (x,y) then —P = (a, —y). In 
Figure 5.2, the point P = (0,1) on the curve y? = x? — 3x + 1 is shown, with 
a sequence of points approaching —P = (0, —1) shown being added to P; the 
third point of intersection is approaching 0, the point at infinity. Take care 
not to confuse 0, the point at infinity, with the origin (0,0). 


Exercise 5.3. Draw a picture of the (x,y) plane with a unit sphere whose 
South pole is tangent to the plane at (0,0). Define a map from the plane 
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Figure 5.1. The binary operation on y” = x?+1, showing (2,3)+(0,1)+(—1,0) = 0. 


to the sphere by sending a point P on the plane to the unique point on the 
sphere that is collinear with P and the North pole. Show that the closure of 
the image of a curve y? = x? + ax? + bx + c in the sphere contains the North 
pole. This single point may be thought of as giving a single “point at infinity” 
on the curve. 


A more subtle question is how to verify the associative law for the binary 
operation. This is so familiar in ordinary addition that we are prone to over- 
look it. When it is encountered in matrix multiplication, it follows from the 
associative law in the underlying ring. Here a different principle is at work: 
Although it is still true that the law is inherited from the complex numbers, it 
is so via a bijection involving transcendental functions. In the twentieth cen- 
tury, algebraic geometers sought to understand this phenomenon in a more 
abstract way. The subject of Abelian varieties is a deep and powerful one 
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Figure 5.2. Points converging to —P = (0,—1) showing the point at infinity. 


about geometric objects, defined over arbitrary fields, with an Abelian group 
structure. 


Exercise 5.4. Convince yourself that the associative law holds for an elliptic 
curve with the geometrical binary operation. In other words, choose a specific 
elliptic curve and plot it accurately. Choose three arbitrary points P, Q and R. 
Now demonstrate geometrically that the point you get by adding R to P+Q 
is the same as the one you get by adding P to Q+ R. 


5.2 The Congruent Number Problem 
In this section, we introduce a problem from antiquity that was recently re- 


interpreted using the theory of elliptic curves. A natural number-theoretic 
question arises with the familiar (3,4,5) triangle in Figure 5.3. This triangle 
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— we may think of it as being defined by the triple of integers (3, 4,5) — has 
integral sides and integral area: What other triples of integers, or rationals, 
have this property? 


4 
Figure 5.3. Six is a congruent number. 


Example 5.4. There is a right-angled triangle with rational sides and area 5. 
The triple (15,62, 62) is Pythagorean: Expressing these as fractions over 6 
and checking that 

9? + 40° = 41? 
confirms the triangle with the sides given is right-angled. The area of the 
triangle is easily computed to be 5. 


We will see later that there are arbitrarily complicated examples of this 
sort. 


Example 5.5. The triple 


2017680 1437599 2094350404801 
1437599’ 168140 ’ 241717895860 


gives a right-angled triangle with rational sides and area 6. 


Examples 5.4 and 5.5 give examples of integer right-angled triangles with 
integral area by clearing fractions, but it is simpler to allow the sides to be 
rational, giving Definition 5.6. 


Exercise 5.5. Find a rational right-angled triangle with area 7. 


Such a triangle was known to Arab mathematicians of the twelfth century 
and rediscovered by Euler in the eighteenth century. 


Definition 5.6. An integer that is the area of a right-angled triangle with 
rational sides is called a congruent number. 


If an integer n is a congruent number and it is divisible by a square then 
the sides of any triangle showing that n is congruent can be scaled accordingly. 
Therefore we will assume without comment that it is sufficient to assume n is 
square-free in any discussion about whether it is a congruent number or not. 
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For millennia it has remained an unsolved problem to find an algorithm 
for checking whether a given integer is a congruent number. In recent times, 
Tunnell has shown how such an algorithm can be devised — see p. 243. What 
is remarkable about his work is the fact that although the proof uses a great 
deal of sophisticated twentieth-century mathematics, the way into the proof 
is a back-of-an-envelope piece of high-school algebra that goes as follows. If n 
is a congruent number, then there is a triple of rational numbers (X,Y, Z) 
with 

X?4Y? = 7? and $XY =n. 
These two equations give two further equations, 


(X#Y)? =X? LINY + ¥*% = 77 +4n, 


Gee as 


Multiplying the two equations (given by the choice of sign) together gives 


RO MONS GN" 
= n?. 
Writing v = (X? — Y?)/4 and u = Z/2, we obtain 


v =ut—n?. 


which can be written 


Now multiply by u? to obtain 


(uv)? = u® — n?u?. 
Finally, writing x = u? and y = uv, we obtain a rational point (x,y) on the 
elliptic curve 
y2 = 28 — nz, 
so a congruent number n gives rise to a rational point on an elliptic curve 
associated with n. 


Example 5.7. If we start with the (3,4,5) triangle, then following the steps 


just given, we obtain the rational point (7?,—%?) on the curve y? = x? — 362 


(following the convention that X = 4 should be the even side). 

The curve y? = x? — 36z has several integral points. In addition to (0,0) 
and (+6,0), there is another, namely (—3,9). One might wonder if these also 
come from right-angled triangles. The answer is no, and Tunnell’s Theorem 
(Theorem 5.8) suggests why not. 

Notice that in the construction above, the z-coordinate we obtained turned 
out to be the square of a rational. Moreover, the denominator of « must be 
even. To see this, remember Theorem 2.1, which determines the Pythagorean 
triples. On clearing the denominators in the triple (X,Y, Z), one of X or Y 
must have an even numerator and Z cannot. Thus the denominator 2 in u = 
Z/2 cannot cancel. 
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Theorem 5.8. Suppose n is a positive integer and (x,y) denotes a rational 
point on the elliptic curve y2 = x3 — n?x with x equal to the square of a 


rational with an even denominator. Then n is a congruent number. 


PROOF. The proof uses the characterization of Pythagorean triples from The- 
orem 2.1. Initially we retrace some of the steps used earlier, but there is an 
ingenious twist at the end of the proof. Let u = ./z > 0; by assumption u € Q. 
Write v = y/u so 


2 


v? = y?/u? = x(x? 


—n*)/e=27? — nr’. 


We therefore have a Pythagorean equation, 
vtn? = 2". (5.1) 


Unfortunately, the resulting triangle does not have area n. Let t denote the 
denominator of v; then ¢ is the denominator of x, by Equation (5.1). Now 
clear the denominators to obtain a Pythagorean triple (t?v, t?n, t?x). Since t 
is even, we can write, for integers a > b > 0, 
tn = 2ab, tv =a? — 0’, Pa =a? +07. 

We claim there is a right-angled triangle with sides 2a/t,2b/t, and 2u. This 
is easy to see: 

2 2 

2 2b 
(=) sf (>) = A(a? +82) /t? = 4t0/t? = 4a = (2u)? 

The area of this triangle is 


1 2a 2b 2ab 
2¢tt 8 


Of course, this theorem does not solve the congruent number problem: 
What makes us think we know any more about the rational points on an 
elliptic curve than we do about congruent numbers? In fact, a great deal of 
research about rational points on elliptic curves took place in the twentieth 
century, so reducing a problem to finding rational points on elliptic curves 
allows many deep results to be applied. Even without invoking any of that, 
we already learn something quite surprising from Theorem 5.8. 


Exercise 5.6. Let P be a rational point on the elliptic curve 


y= —ne 


which is neither (0,0) nor (+n,0). Using the algebraic doubling formula we 
used before, show that the x-coordinate of the resulting point is the square 
of a rational with an even denominator. Thus, if we can keep doing this, we 
obtain (potentially) infinitely many different rational right-angled triangles 
with area n. This was certainly not obvious from the definition of a congruent 
number. 
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The construction in Theorem 5.8 looked a little unwieldy. The next result 
is a neater formulation. 


Theorem 5.9. Suppose n is a positive integer and x € Q has the property 
that x,x-+n,x—n are all rational squares. Put 


X=VJVetn—-VJVra—nY =Vetnt+Va—n,Z = 22. 


Then the triangle with sides X,Y, and Z is a rational right-angled triangle 
with area n. 


Exercise 5.7. Confirm the statements in Theorem 5.9. 
The shape of the equation defining the elliptic curve 
y? = a(x+n)(e—n) 


might lead you to think the conditions of Theorem 5.9 must always be satisfied 
for a rational point (x,y). The point (—3,9) is a counterexample however. 
Subsequently (see Section 7.3) we will come to understand when the conditions 
hold in terms of the group-theoretic structure of the curve. 


Exercise 5.8. Take n = 6 and P = (2, 2), Find the rational right-angled 
triangle of area 6 corresponding to 2P. Find the triangle corresponding to 4P. 


Exercise 5.9. Find a rational point P other than (0,0) or (+5,0) on the 
curve y? = x? — 252. Use P to find a rational right-angled triangle of area 5 
different from Example 5.4. 


The hard part of all this is to understand when rational points of the right 
kind exist in the first place. It is somewhat easier to show that as long as a 
rational point is not (0,0) or (+n,0), then one can go on constructing others 
with the right properties to guarantee the existence of many rational right- 
angled triangles with area n. Thus the problem comes down to finding for 
which n are there any nontrivial rational points. A satisfactory resolution of 
this problem has recently been given — see p. 243, where Tunnell’s Theorem 
is stated. We will go on now to relate the geometric construction given before 
to the existence of a group structure on the curve. 


Exercise 5.10. The geometric addition on elliptic curves allows us to con- 
struct new rational right-angled triangles from existing ones. In this exercise, 
the same construction is carried out directly on the triangle. Let (x,y,z) be 
a Pythagorean triple with « < y < z. This construction will find another 
Pythagorean triple (X,Y, Z) with 


ae 
ye 
2 
Y= ea aoe 
y x 
4 4 6x2 y2 
pale +y° + Oxy 
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Let Py, Py, P, denote the vertices of the triangle, opposite the sides x, y and z 
respectively. Draw a circle with center P, and radius x, and let Q be the point 
on this circle where a tangential line from P, meets the circle (see Figure 5.4). 
Extend the line P,P, to a point R at a distance 2z from P,. Now draw a circle 
with center P, through Q, and call S the point of intersection between the 
circle and the line P, Py. Finally, draw a line through S parallel to QR, and 
let T be the intersection of this line with P,Q. 


Figure 5.4. Constructing a new Pythagorean triple. 


Prove that the distance from P, to T is X = ve, and show how to 
continue the construction to find the length Y. 


Example 5.10. Consider the curve y? = x? — 362 and the point P = (3, 2) 


on the curve. Then P is a rational point of infinite order, and we compute 
that 


_ (1442401 1726556399 
2P = ( 19600 ° 2744000 ) 


and 


4P= (eae a eee 870369109085580828275935650626254401 ) 
233710164715943220558400 ” 11298385812463619737216684496448000 /* 
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The elliptic curve y? = x° — 36x allows other rational right triangles with 
area 6 to be computed: Using the points above, one finds the right-angled 


triangles with sides 
120 7 1201 
7°10’ 70 


2017680 1437599 2094350404801 
1437599’ 168140 ’ 241717895860 /’ 


each of which has area 6 (see Exercise 5.8 on p. 102). 


and 


The arithmetic complexity of the rational points in Example 5.10 seems 
to grow enormously, just as we saw in Example 5.1 on p. 95, and we want to 
quantify this growth in complexity. As we saw stated in Theorem 5.2, there is 
a quadratic-exponential growth in the size of the denominators. To make this 
more precise, we will use a naive notion of “height” on elliptic curves over the 
rationals, which allows us to measure how rational points grow in complexity 
under maps such as P ++ 2P. This notion of height was introduced by Mordell 
with the specific aim of proving the following theorem, which was conjectured 
by Poincaré. The proof will exercise us considerably in Chapter 7. 

For an elliptic curve F defined over the rationals, denote by F(Q) the set of 
points on & with rational coordinates, together with the point ‘at infinity’. The 
geometrical addition law makes E(Q) into a group, and Mordell’s Theorem 
says something about the structure of this group. 


Theorem 5.11. [MORDELL’S THEOREM] Let E denote an elliptic curve de- 
fined over Q. Then E(Q) is a finitely generated Abelian group. 


A complete proof of this theorem may be found in the references at the end 
of the chapter. In Section 7.2 we will show how it follows from the so-called 
weak Mordell Theorem, and then in Section 7.3 will prove the weak Mordell 
Theorem in a special case. 

Later developments have placed this result in a more general context. Al- 
gebraic curves have an integer parameter called the genus, which measures the 
topological complexity of the underlying complex space. For an elliptic curve, 
the fundamental domain (this will be defined in Chapter 6 on p. 122) can be 
wrapped up into a torus (or doughnut) that is topologically a sphere with one 
handle. Roughly speaking, the genus counts the complexity in this topologi- 
cal sense when the underlying field of definition is the complex numbers. One 
of the great challenges facing mathematicians during the last century was to 
give a properly precise definition of genus when the base field is arbitrary. 
Remarkably, the genus of a curve seems to govern how many rational points 
it will have. Elliptic curves have genus one, giving a finitely generated group 
of rational points. Curves of genus greater than one have only finitely many 
rational points by a deep result of Faltings. 

Theorem 5.11 means that E(Q) is isomorphic to Z” x F' for some r € N 
and finite group F’. The number r is called the rank of the curve, and it is 
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conjectured that for any r € N there is a curve defined over the rationals with 
rank r. The possibilities for the finite group F' are more constrained — we will 
describe some of this in Section 5.4. 


5.3 Explicit Formulas 


In this section we will turn the geometric notion of addition on an elliptic curve 
into an algebraic formulation that allows computations to be made. We are 
going to work with a special form of cubic equation throughout this section. 
Subsequently, we will explain how the different forms of equation relate to 
each other. As a warm-up, we recommend the following exercise. 


Exercise 5.11. Let p(x) = 2° +ax?+br+c = (a—A1)(a— A2)(x— Az). Find 
expressions in a,b and c for AyA2A3, AtA2 + A1A3 + A2A3, and Az, + Ag + A3. 


Given points P; = (#1, y1) and P2 = (2, y2) on the elliptic curve 
y =x +art+b, 


explicit formulas may be found for 73 and y3, where P; + P: = (x3, y3). 
Case I: If x1 4 xo, then the line joining P, to Pz has equation 


yw _ om 
L-X, 2-24 


= (B=*) (Seas) 
y= x4 ' 
2-21 TQ — 21 
=—-—[-_—_—"__"” =—_-e=-_--_——’” 
a B 
Substituting this into the equation 


so 


yi =a? +ar+b 


for the curve gives 
(az+ 6 =2? +ar+, 


whose roots are the x-coordinates 71, £2, £3 of the three points of intersection 
with the curve. By the sum of roots formula in Exercise 5.11, we must have 


2 
T1+%4+%73=Q', 


so 


2 
Yy2—- V1 

G3 = 0° — 4-49 = Ly v2. 
TQ — 21 


Reflecting in the z-axis gives P3, so 
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Y2— Y1 T2Y1 — T1Y2 
y3 = —a73—- P= x3 
2 — LY @2— Xy 


y3 = a(x, — £3) — 1. 


or 


Case II: Assume that 21 = x2 and y, = yo. Let y = ax + 8 be the equa- 
tion of the tangent to the curve at (x1, y1). By implicit differentiation of the 
equation y? = x? + ax + b, we obtain 
307 +a 
a= 


2y1 


and hence 


3a} +a - 3a} +a 3a7 +a . 

oo ( 2y1 a ( 241 )(( 2y1 ey 
so y3 = a(x1 — 73) — yn. 

Case III: If x1 = xg and yy = —yo, then P, = —P> so Ps is the point at 
infinity. 

Notice that all the formulas are rational functions (quotients of polyno- 
mials) with coefficients in the same field as a and b. This suggests there is 
a closure property as follows. Let L denote any field over which the curve is 
defined, and write E(L) for the set of points with coefficients in L together 
with the point at infinity. Then P,, P, € E(L) implies that P, + P2 € E(L). 
Thus the group operation is well-defined on E(L). Actually, some care needs 
to be taken if the characteristic of L is 2 or 3, starting with a different form 
of equation. We will discuss this further in Section 5.3.2. 


5.3.1 Torsion Points 


Later, we will give a more precise explanation of the identity element for the 
group operation. For the moment, we continue to think of the identity as the 
point at infinity, so an equation such as 2P = 0 on the curve F means that a 
vertical line is a tangent to E at the point P. This allows us to speak of torsion 
points on an elliptic curve EF: P is a point of order dividing n if nP = 0 in 
this geometrical sense. As we will see, the geometrical definition really gives a 
group structure to the points on the elliptic curve, and thus the usual terms 
from group theory such as “torsion” and “order” can be applied. 


Example 5.12. Consider the curve E : y? = x? +1, and let P = (2,3). Using 
the formulas, we find 


2P = (0,1),3P =2P + P = (-1,0),4P =3P + P = (0,-1) = -2P. 


It follows that 6P = 0 (so P is a torsion point with respect to the group 
structure on the curve), and since P,2P,3P 4 0, the point P has order 6 (see 
Figure 5.5). 
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Figure 5.5. The point P = (2,3) has order 6 on y? = x? +1. 


Exercise 5.12. Find the order of the point (3,8) on the elliptic curve 
y? = «x? — 43x + 166. 
Exercise 5.13. Find the order of the point (0,16) on the elliptic curve 
y? = 2° + 256. 
Exercise 5.14. Find the order of the point (5, 5) on the elliptic curve 


2) 8c 
yo =a + Fa. 


Exercise 5.15. Find the order of the point (-3, 5) on the elliptic curve 


me Apa 
ie gt + ios: 
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Exercise 5.16. Suppose that K denotes any field with characteristic not 
equal to 2 or 3, and E:: y? = 23 +axr+6 (a,b € K). Assuming the bi- 
nary operation defined before makes F(X) into a group, prove that P = (a, y) 
has order 2 if and only if y = 0. 


Exercise 5.17. Suppose 1 < n € N and consider the elliptic curve 
E:yaz?—n'c. 


Prove that there are only two real points of order 3 in F(R). Mark these points 
on a graph of the curve. (They are points of inflexion.) It may be interesting 
to look at Example 7.2 on p. 134. 


Exercise 5.18. Use your graph from Exercise 5.17 to show that the subgroup 
eee) 


of real points on y? = 2° —n?z with order dividing 4 is isomorphic to Cz x C4. 

Exercise 7.5 on p. 138 describes a useful result that allows all the ratio- 
nal torsion points on an integral elliptic curve to be effectively determined. 
When K = Q, there are not many possibilities for the orders of torsion points 
in E(Q). For example, in Section 5.4 we show that there are no points of 
order 11 on any elliptic curve defined over the rationals (assuming a difficult 
but, in principle, elementary result from Diophantine equations). On the other 
hand, the complex torsion points on an elliptic curve are easy to describe once 
we have the necessary function theory — see Section 6.3. 


5.3.2 The Equation Defining an Elliptic Curve 


At several points we have used equations of differing shapes to define an elliptic 
curve. In the statement of Siegel’s Theorem (Theorem 2.13) we set y? equal to 
a cubic in x with no repeated zeros. The addition formulas in the last section 
were computed using a special type of cubic. It is fair to ask just what is the 
correct definition in general. In Chapter 6 we will see that a pair of complex 
functions parametrize a curve of the shape y? = 2° + ax + b in which the 
right-hand side has no repeated zeros. Because of his important work in the 
area, this equation became known as a Weierstrass equation or Weierstrass 
model. We will see that the geometric definition of addition does indeed impose 
a group structure on the complex solutions of that equation. However, the 
explicit formulas define a group structure over any field of characteristic other 
than 2 or 3 (as does the geometrical definition of the group operation when 
a suitable notion of tangency is developed). We will have to ask you to take 
this statement on trust, or apply the Lefschetz? principle. 


3 The Lefschetz principle says, in effect, that if an algebraic formula holds in C, then 
it will hold in any field where it makes sense. Although this is a valid principle, 
generally it is best used as a pointer toward phenomena that deserve to be bet- 
ter understood, rather in the way that algebraic geometers came to understand 
elliptic curves and their generalizations. 
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Suppose then that F denotes any field. It is possible to develop a theory of 
elliptic curves from one equation, regardless of the characteristic. When using 
the Weierstrass equation in characteristic 2 or 3, one cannot define tangents 
adequately (look at what happens when you differentiate). Tate used a more 
general equation with the following shape: 


E:y?+a,cy + agy = x? + aor? + agzx + a¢ (5.2) 


with a1, a2, 43,a4,a¢6 € F satisfying the non-degeneracy condition that every 
point on the curve has a unique tangent. This condition is equivalent to the 
non-vanishing of a complicated polynomial expression. For the special case in 
which Equation (5.2) takes the form y? = x3 + ax + b, this non-degeneracy 
condition is equivalent to the cubic having no repeated zeros, and therefore 
to the condition that 4a3 + 27b? 4 0 (see Exercise 2.14). The addition for- 
mulas can all be worked out for this general equation, in any characteristic. 
However, the formulas are significantly more complicated and this can hinder 
the development of intuition about the group law. This is why we prefer to 
develop the theory for the Weierstrass equation. The Equation (5.2) became 
known as a generalized Weierstrass equation, although it is becoming usual 
to refer to this too as a Weierstrass equation. The reader should beware that 
the modern literature on elliptic curves tends to work with the generalized 
equation. 

The following exercise shows how gory the associative law can be when 
expressed in terms of the algebraic formula, even for the simplest form of 
equation. 


Exercise 5.19. Using just the Weierstrass equation y? = x?+a2x+b, verify the 
associative law for addition on an elliptic curve using the algebraic formulas 
from Section 5.3. Different formulas are required depending upon whether 
the «-coordinates are equal or not. Even doing one special case of 


P+(Q+R)=(P+Q)4+R 
is tiresome and requires a great deal of both paper and patience. 


Although we do not have the space to develop the algebraic geometry 
needed to properly develop a theory of elliptic curves over arbitrary fields, we 
recommend doing the following exercise to get a feel for elliptic curves over a 
finite field. 


Exercise 5.20. Let E denote the elliptic curve y? = x° — 2. Find the order 
of the point (3,5) in the group E(F7). What is the order of E(F7)? Do the 
same over other fields F, for primes p. Can you detect any restrictions of 
the resulting group orders? For a precise result on this theme consult Hasse’s 
Theorem (Theorem 11.11 on p. 240). 


In several respects the group E(F), where F denotes a finite field, can be 
studied along the lines that we studied F*. The two groups will often exhibit 
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properties that can be directly related — and this phenomenon is useful in 
cryptography and coding theory. Earlier we proved that F* is always a cyclic 
group. Therefore a natural question is to ask for the structure of E(F). 


Exercise 5.21. *Let F be a finite field. Prove that E(F) is always a cyclic 
group or a direct product of two cyclic groups. Find an example where the 
group has two nontrivial cyclic factors. 


5.4 Points of Order Eleven 


The structure of the points of finite order in the group E(Q) for an elliptic 
curve defined over the rationals is very constrained: A deep result of Mazur 
says that the torsion subgroup of E(Q) must be isomorphic to Z/nZ for 
some n, 1 <n < 12, n £11, or to Z/2Z 6 Z/nZ, 1 < n < 4. Proving this 
important result requires more material, but we can exhibit one nontrivial 
constraint (assuming a difficult Diophantine result and using some elementary 
properties of the geometry of the rational projective plane P?(Q)). If you have 
not encountered projective space, postpone this section until you have read 
Section 6.2. In what follows, we use little more than the geometric definition 
of addition on an elliptic curve to paint a putative rational point of order 11 
into a corner where it cannot exist. 


Theorem 5.13. If E is an elliptic curve defined over Q, then E(Q) has no 
point of order 11. 


Proor. Assume that P is a point in E(Q) with order 11. Then no three 
points of S = {0, P,3P,4P} could lie on a straight line because if A, B,C are 
collinear then A+ B+C =0 by the geometric definition of group addition. 
Since P has order 11, this last equation is impossible for three distinct points 
from S. 

It follows that there is a nonsingular linear map on P?(Q) sending 


0 > [0,1,0], P > [1,0,0],3P — [0,0,1], and 4P —> [1,1,1]. 
To see this, notice first that of the four points 
(0, 1, 0], [1, 0, 0], (0,0, 1], [1,1, 1], 


no three are collinear, by checking the various determinants. Given any four 
points with homogenous coordinates v1, V2, V3, V4, the matrix 


M = [av}|bv5 |cv5] 


will, for any a,b,c £0, send 
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1,0,0] > v1, 
0, 1,0] + vo, 
0,0, 1] 


1,1,1 


? 9 


[ 
[ 
[ + v3, and 

[ — avi + bve2 + cv3. 


The equation av, + bv2 + cv3 = v4 has a unique solution with a,b,c all 
nonzero by the non-collinearity assumption. Thus, by applying a change of 
variables in P?(Q), we may assume that 0 = [0, 1,0], P = [1,0,0],3P = (0,0, 1], 
and 4P = [1,1, 1]. 

Now let 5P = [x1, x2, x3]. Then, if ¢; is the line through 5P and 0, and £4 
is the line through 4P and P, —5P € ¢, 1 é3. Thus 

r{0, 1,0] + s[21, v2, v3] = t{1, 0,0] + w{1,1, 1], 

for some r,s,t,w € Q. Comparing coefficients shows that 


84, =t+w;sta +r = WwW; 8X3 = w. 


If s = 0, then P = 0, which is impossible, so without loss of generality we 
may put s=1. Then r = x3 — x2, and so 


—5P =r(0,1,0] + s[a1, 22, 23] = [1, v3, v3]. 


Similar arguments show that 


—4P = peg Oh a 
—-P= [vy _ x3, 22,0], 
—3P = [0,73 — a1 + 22,23 — 41], and 


2 2 2 
2P = [x1 %3 — 7 +0102, 05 — 1103 4+ 093,05 — £143). 


Since 11P = 0, the points 5P,4P,2P are collinear. Taking the determinant 
of the matrix whose rows are the coefficients of these points, it follows that 


£3 — irq + ate3 + 2123 — 22123 = 0. (5.3) 
We claim that the only rational solutions to Equation (5.3) are 
(0,1, 0], [1, 1, 1], [1, 0, 0], [1, 0, 1], [, 1, 0]. 


The notes at the end of the chapter provide references where this difficult 
result is proved. The point 5P must correspond to one of these possibilities. 
It cannot be [0, 1, 0] because this is 0 and 5P # 0. It cannot be [1, 1, 1] because 
this is 4P and 5P = 4P implies P = 0. Similarly, it cannot be [1,0,0] because 
this is P and 5P = P implies 4P = 0. It cannot be [1,0, 1] because this is —4P 
and 9P # 0. It cannot be [1, 1,0] because this is —P and 6P 4 0. 
The contradiction proves that there can be no such point P. 
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5.5 Prime Values of Elliptic Divisibility Sequences 


Elliptic curves generate a family of integer sequences that relate to several 
interesting parts of mathematics, including graph theory and cryptography. 
Suppose the elliptic curve F has a nontorsion point P € E(Q). Write 


An 


a(nP) = Be? 


(5.4) 
in lowest terms, with A, and B, in Z. An elliptic analog of the question about 
Mersenne primes asks how often B,, is prime as n varies. Because B, grows 
so rapidly, this is potentially a method to find very large prime numbers. 


Example 5.14. Let 
E:y =2°+26, P= (-1,5). 
The term Bag is a prime with 286 decimal digits. 


Example 5.15. Let 
E:y=2°+15, P= (1,4). 


The term By, is a prime with 510 decimal digits. 


In some respects, this method for producing primes mirrors the situation 
with sequences such as the Mersenne and Fibonacci sequences, which are 
expected to produce large primes. For many years, the largest known primes 
have come from the Mersenne sequence. However, numerical investigation 
suggests that, for fixed EF and P, the sequence (B,,) should only contain 
finitely many primes, and a non-rigorous probabilistic argument* suggests 
the number of prime terms should be uniformly bounded. 

Just like the Mersenne and Fibonacci sequences, the sequence (B,,) is a 
divisibility sequence, meaning that B,,,|B, whenever m|n. A consequence of 
this property, together with the rapid growth rate, is that there can only be 
finitely many primes in the sequence (B,,) if P is the multiple of another point, 
or if P is a non-integral point; moreover the terms B,, for large n cannot be 
prime if the index n is not itself prime. We say that a rational point is a 
generator if it is not the multiple of any other rational point. 

Let EF and E” be two elliptic curves defined over Q. An isogeny is a nonzero 
homomorphism defined by rational functions on the coordinates of the points: 


* Crudely, the Prime Number Theorem (Theorem 8.1) implies that the probability 
that a large integer N is prime is approximately 1/log N. The expected number 
of prime terms B, with n < « is (speculatively) approximately >7,,-,, 1/ log Bn. 
By Theorem 5.2 this sum converges as x — oo. It is known that the quantity h 
appearing in Theorem 5.2 is uniformly bounded below by some positive constant 
independent of the initial nontorsion rational point P and curve defined over the 
rationals FE, provided the starting equation has minimal A. 


5.5 Prime Values of Elliptic Divisibility Sequences 113 
o: E> E’. 


Taking EF = E’, the multiplication-by-n map P + nP for n € Z is an example 
of an isogeny. The isogeny has an integral degree m > 1, which is the degree 
of the underlying rational functions that define it. 


Exercise 5.22. *Prove that the degree of the isogeny P ++ nP is n?. 


The curves EF and E” are said to be m-isogenous if there is an isogeny of 
degree m between them. It can be proved that the multiplication-by-n map 
can be factorized as a composition of two isogenies, each of degree n. 


Definition 5.16. We say the point P © E(Q) is magnified if it is the image 
of a rational point under an isogeny of degree m > 1. 


The term was chosen because the height of a point increases under such a 
map — see Chapter 7 for more details about heights. The following result of 
Everest, Miller and Stephens will not be proved here. 


Theorem 5.17. If P € E(Q) is a magnified point, then B, is a prime power 
for only finitely many n. 


Example 5.18. (1) The curve 
y =a +a2°—4r 
is 2-isogenous to the curve in Weierstrass form, 
E:y=2' +27? +162 +16. 


The generator (—2,2) maps to the generator P = (0,4) on E. Thus the 
sequence of denominators for P on FE contains only a finite number of prime 
powers. 
(2) The curve 
y’ = a> —92+9 


is 3-isogenous to the curve in Weierstrass form, 
E:y? = 2° — 1892 — 999. 


The generator (1,1) maps to the generator P = (—8,1) on E. Thus the 
sequence of denominators for P on FE contains only a finite number of prime 
powers. 


Call the number of distinct prime divisors of an integer its length. The 
following conjecture has arisen from work of Everest and King. 


Conjecture 5.19. Given a fixed bound on the length, there are only finitely 
many terms B,, with length below that bound. 
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5.5.1 The curve u? + v3? = D 


This section shows that the primality question can be answered in complete 
generality for curves in homogenous form. 


Theorem 5.20. Suppose E denotes a curve defined by an equation 
ye +v? =D (5.5) 


for some nonzero D € Q. Let P denote a nontorsion Q-rational point. Write, 


in lowest terms, 
A 
puf eee) 
Bp’ Bp 


Then the integers Bp are prime powers for only finitely many Q-points P. 


Note that the shape of the rational points is slightly different; the denom- 
inators of the x and y coordinates are not compelled to be powers. These 
curves, although not in the form to which we are accustomed, are still elliptic 
curves. The geometric addition used before works here and defines a group. 
As we shall see, a simple transformation puts them into the more usual form. 


Example 5.21. As Ramanujan famously pointed out, the taxicab equation® 
x? + y? = 1729, (5.6) 
has two distinct integral solutions. These give rise to points 
P = (1,12) and Q = (9,10) 


on the elliptic curve defined by Equation (5.6). The only rational points 
on Equation (5.6) that seem to yield prime denominators are 2Q and P+ Q 
(and their inverses). 


PROOF OF THEOREM 5.20. There is a transformation between the homoge- 
nous model given by Equation (5.5) and the Weierstrass model, 


y? = a — 2433D?. 


The transformations are given by 


273D 2737 D(u — v) 
i => 
uty’ y Uutv ; 
2?3?D+y OP3F 1) yy 
u= —.—, v= ——_: 
6x 6x 


> Srinivasa Ramanujan was a largely self-taught mathematical genius. According 
to C. P. Snow, on one of G. H. Hardy’s visits to Ramanujan in the hospital in 
Putney, Hardy said “I thought the number of my taxicab was 1729. It seemed to 
me rather a dull number.” To which Ramanujan replied, “No, Hardy! It is a very 
interesting number. It is the smallest number expressible as the sum of two cubes 
in two different ways.” 
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Writing « = X/Z? and y = Y/Z?, where ged(X, Z) = gcd(Y, Z) = 1, it follows 
that 

2?3°DZIA+Y 
7 6XZ 
If X divides the numerator of u, then X divides 2°3°D?. By Siegel’s Theorem 
(Theorem 2.13), this can only happen finitely often. Since Z is coprime to the 
numerator, apart from a finite number of points, the denominator of u always 
has two nontrivial coprime factors. 


U 


Exercise 5.23. Prove that any integer solutions to the equation u® +v? = D 


have max{|u|, |u|} < 24/ Zl. 


5.5.2 Higher Rank Considerations 


Let E denote an elliptic curve, defined over Q. We say rational points P and Q 
are independent if no integer linear combination mP + nQ can represent the 
point at infinity unless m= n= 0. 


Theorem 5.22. Let E denote an elliptic curve, defined over Q, and suppose 
that P and Q denote independent rational points both of which are magnified 
under the same isogeny. Write 

An.m 


x(nP +mQ) = B (5.7) 


Then there are only finitely many pairs (m,n) for which By m is prime. 


This theorem will not be proved here. Examples of the phenomenon of 
simultaneous magnification under the same isogeny are not easy to find: The 
following example uses the generalized Weierstrass form (5.2). 


Example 5.23. The elliptic curve 
y? + cy = «? + x? — 1562 + 2070 


has independent generators P = (3,39) and Q = (13,43) that are magnified 
under the same 2-isogeny. 


Remark 5.24. Probabilistic arguments together with results from some numer- 
ical experiments suggest that, for certain curves in Weierstrass form (5.2), if P 
and @ denote independent nontorsion rational points, then the denominator 
of nP + mQ can be the square of a prime infinitely often. Indeed, there seem 
to be asymptotically clog X such primes with |ml, |n| < X. Of course, none of 
the numerical examples that are considered in these arguments use magnified 
points. 
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5.5.3 Elliptic Analogs of Zsigmondy’s Theorem 
Zsigmondy’s Theorem (Theorem 1.16 on p. 28) has an elliptic analog. 


Theorem 5.25. [SILVERMAN] Let E denote an elliptic curve defined over Q, 
in generalized Weierstrass form, and let P = (a(P),y(P)) denote a nontor- 
sion rational point on E. Let x(nP) = Ag in lowest terms. Then the elliptic 
divisibility sequence (B,) satisfies a Zsigmondy theorem: For all sufficiently 
large n, B, has a primitive divisor. 


In view of the fact that sequences such as (B,,) seem likely to contain only 
finitely many prime terms, Theorem 5.25 takes on a more interesting status, 
as a means of producing large primes from elliptic divisibility sequences. 

Analogs of the precise bound in Theorem 1.15 hold for certain elliptic divis- 
ibility sequences. The next result is an explicit bound for the first appearance 
of a primitive divisor in a congruent number curve. 


Example 5.26. Let E denote the curve 
E:y=2° —25¢ 
and let P = (—4,6). Then B,, has a primitive divisor for every n > lL. 
The factorizations of B,, for this example, 2 < n < 8, with the primitive 


divisors in bold, are shown in Table 5.2. 


Table 5.2. Primitive divisors of (Bn). 


n Bn Factorization 

2 12 27-3 

3 2257 37-61 

4 1494696 23.3.77.31-41 

5 8914433905 5-13-17-761-10601 

6 178761481355556 2?.37.11-37-61-71-587-4799 

7 62419747600438859233 197-421-215153-3498052153 
8|5354229862821602092291248] 24.3-77-31-41-113279-3344161-4728001 


There is a difference in the proof for the odd and even terms. For a se- 
quence (B,,), define the even Zsigmondy bound of (B,,) to be the greatest even 
integer n for which B,, does not have a primitive divisor, and similarly define 
the odd Zsigmondy bound of (B,,) to be the greatest odd integer for which B,, 
does not have a primitive divisor. 


Theorem 5.27. Let E denote the elliptic curve 


E:y=02-—T’s 


o 
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where T > 1 is a square-free integer. Let P € E(Q) denote a nontorsion 
point and write B? for the denominator of x(nP). Then the even Zsigmondy 
bound of the sequence (B,) is not greater than 18. If «(P) < 0, then the odd 
Zsigmondy bound of (B,) is not greater than 3. If x(P) is a square, then the 
odd Zsigmondy bound is not greater than 21. 


In specific cases, the terms not covered by Theorem 5.27 can be checked on 
a computer; this is how Example 5.26 was computed. Theorem 5.27 will not 
be proved here, but the main idea is contained in the following exercise. The 
condition stated there for the absence of a primitive divisor is very similar to 
that found for the Mersenne numbers in Exercise 1.16(b) on p. 28. 


Exercise 5.24. It can be shown that if B, does not have a primitive divisor 
then 
Bn|n |] Buyp- 


pin 


Assuming this, use Theorem 5.2 to deduce that n must be bounded. 


5.6 Ramanujan Numbers and the Taxicab Problem 


In view of Example 5.21 and the story concerning Ramanujan, integers N for 
which the Diophantine equation 


N=2x°4+y 


has two nontrivially distinct solutions are sometimes called Ramanujan num- 
bers. Table 5.3 shows the first few of these; there are infinitely many such 
numbers. In the table u® + v3 = x3 + y?. 


Table 5.3. The first few Ramanujan numbers. 


Nj] uj} vi x) y 
1729} 1)12) 9/10 
4104) 2/16} 9}15 

13832|18)20) 2/24 
20683] 10]27|19|24 
32832]18]30] 4/32 


Indeed, it turns out that for any k& there are infinitely many numbers N 
with the property that N can be expressed as a nontrivial sum of two cubes 
in k essentially different ways. The smallest number T(k) with this property is 
called the kth taxicab number or Hardy—Ramanujan number. Table 5.4 shows 
the known taxicab numbers with the pairs whose cubes sum to the number, 
and the discoverer. 
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Table 5.4. The first few taxicab numbers. 


> 


T(k) Pairs Discoverer 
I 2 1,1 
I, 12 
1729 9. 10 
167, 436 
3 87539319 228, 423 Leech (1957) 
255, 414 
2421, 19083 
5436, 18948 
10200, 18072 
13322, 16630 
38787, 365757 
107839, 362753 
48988659276962496| 205292, 342952|Wilson (1997) 
221424, 336588 
231518, 331954 


ie) 


de Bessy (1657) 


4) 6963472309248 Rosenstiel et al. (1991) 


or 


It is suspected that. 
T(6) = 24153319581254312065344. 


NOTES TO CHAPTER 5: The footnote about Bachet’s equation on p. 93 is taken from 
the book of Silverman and Tate [143]. A very thorough treatment of all aspects of 
elliptic curves is given in Silverman’s books [139], [142], and aspects of elliptic curves 
close to the topics in number theory we study are in Koblitz’s book [89]. These books 
are highly recommended to any reader interested in learning more about elliptic 
curves. The construction in Exercise 5.10 on p. 102 was shown to us by Bartholdi, 
and we thank him for permission to include it here. The congruent number problem 
and its connection to elliptic curves are described in detail in Koblitz’s book [89]. 
Mordell’s Theorem appears first in his paper [110]; the paper of Poincaré mentioned 
is [116]. An attractive historical account of Mordell’s theorem may be found in the 
paper of Cassells [26]. Faltings’ Theorem on higher-genus curves appears in his pa- 
pers [61] and [62]. An account of some of the background needed for this proof 
appears in the conference proceedings [34] edited by Cornell and Silverman. There 
are expositions of Faltings’ proof by Deligne [41] and Szpiro [149]. The claim about 
the integral solutions to Equation (5.3) may be found in several places, including 
a paper [14] by Billing and Mahler; the presentation in Section 5.4 comes from a 
course taught by Silverberg at Ohio State University. Mazur’s Theorem appeared 
first in his paper [105]; a treatment may also be found in Silverman’s book [139]. 
Elliptic divisibility sequences are discussed in the monograph [58, Chapter 10] by 
Everest, van der Poorten, Shparlinski and Ward. The incidence of primes in these 
sequences has been studied by Chudnovsky and Chudnovsky [30] (Example 5.14 is 
taken from that paper), Einsiedler, Everest and Ward [48] and Rogers [131]. Theo- 
rem 5.17 appears in the paper [56] of Everest, Miller and Stephens; Example 5.18 
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comes from Cremona’s Web site [37]. More on Conjecture 5.19 may be found in a 
paper of Everest and King [54]. More on Remark 5.24 may be found in a paper of 
Everest, Rogers and Ward [57] or Rogers’ thesis [131]. Exercise 5.23 is taken from the 
book [143, p. 149] by Silverman and Tate. Theorem 5.25 is proved in Silverman’s 
paper [140]; Example 5.26 and Theorem 5.27 are taken from a paper of Everest, 
McLaren and Ward [55]. References for the taxicab numbers in Table 5.4 may be 
found in Sloane’s on-line encyclopedia of integer sequences [144]; there is an ele- 
mentary account of the connection between T'(2) and elliptic curves in an accessible 
paper by Silverman [141], and the calculation of T(5) is described in an article by 
Wilson [163]. 
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Elliptic Functions 


Elliptic curves can be viewed from many different mathematical perspectives. 
In the last chapter, they were seen as primarily geometrical objects; in this 
chapter, we start by emphasizing their relationship with some classical tran- 
scendental functions from complex analysis. To motivate the material in this 
chapter, recall that the trigonometric functions sine and cosine parametrize 
the points on the circle S!. The rational points on the circle in turn parametrize 
Pythagorean triples. This gives a triangle of ideas involving the circle: in one 
corner are the classical transcendental functions, in another a compact group, 
and in the third a connection to a Diophantine problem. In the last chapter, 
we saw two corners of an analogous triangle involving elliptic curves. Ratio- 
nal points on elliptic curves give solutions to various Diophantine problems. 
Our next goal is to fill out the third corner of the elliptic triangle by finding 
transcendental functions that parametrize the points on elliptic curves. An 
important by-product of our work will be the justification that the operation 
defined by geometry in Chapter 5 really satisfies the axioms for a group. (See 
Theorem 6.5 and the comments just after.) 


6.1 Elliptic Functions 


Let LE C C denote a lattice in the complex plane. This means L is the set 
of integer linear combinations of two complex numbers w, and wz that are 
linearly independent over R. Write (w1,w2) for the lattice w1Z+uweZ C C. 
More generally, a lattice in R” is any subgroup isomorphic to Z”; a lattice 
in C coincides with this definition by viewing C as R?. 

One of the ways lattices of different dimensions arise naturally is in the 
study of periodic functions. The best-known example is the exponential func- 
tion 


e:R>S'={zeC||z|=1} 


rr el” 
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This is a periodic function because it satisfies e(a + 27) = e(x) for all x € R, 
so e is periodic with respect to the one-dimensional lattice 27Z C R. 
We are interested in complex functions f with the doubly-periodic property 
that 
f(z +01) = f(z + we) = f(z), 


that is, functions that are periodic with respect to L or L-periodic. 


Figure 6.1. The lattice L spanned by wi and we in C. 


The lattice DL is represented as a discrete subset of C in Figure 6.1: The 
points of D are the points where the dashed lines intersect. The shaded region 


IT = {riwi + rewe | O<ri,re< 1} 


is a fundamental domain for the quotient C/L in the sense that each coset of L 
has exactly one representative in IJ. The L-periodic function analogous to the 
exponential function that we will study is called the Weierstrass g-function 
corresponding to L. For any z ¢ L, this is defined to be 


eL(z) = 5 + {=F - a} (6.1) 
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The elements of L have to be enumerated in some way in order to define the 
sum. For the moment, suppose some enumeration L\{0O} = {41, é2,...} has 
been fixed and define )'ouscp, f(2) to be yr, f(Ln). We will first prove that 
the series in Equation (6.1) converges absolutely. It follows that the order in 
which the enumeration takes place does not affect the value of the sum. 


Lemma 6.1. The series 
1 1 1 
exQ=at de {oo -z} 
Zz errs (z — £) £ 


is absolutely convergent for any z ¢ L. The series defines a meromorphic 
function whose only singularities are double poles at each lattice point in L. 


PrRooF. Let z be any point not in L. Write 


1 1 1 22-22/¢ 


(z—£)2 (2 $B (z/@—-1)?° 


Since |z/¢—1| is bounded below by a positive constant, there is a constant C 
depending on z such that 


1 1 
G- 2 


Ch 
Se 
~ (él 


Therefore, it is enough to prove that the series )’yupcy, |é|-* converges. To 
see this, notice first that there is a constant C > 0 with the property that 


1 
|mw, + nw] > G maxt|m, |r|}. 


Exercise 6.1. Prove that there are 8k integer pairs (m,n) with max{|ml, |n|} 
equal to k. (See Figure 6.2, which suggests an inductive proof.) 


It follows that 


= 1 
Ss \¢| Fs y , [near + nw 


OALEL (m,n)A(0,0 
1 

<C. eet 
oe max{|m|, |n|}9 
8k “1 

— 3 = 3 

= 0.) =8C a a 
k=1 k=1 


which converges. We have shown that the series defining g,(z) converges 
absolutely for z € C\L. 
Finally, it is clear that the only pole of gz in I is a double pole at 0 since 
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Figure 6.2. There are 8k integer pairs (m,n) with max{|m], |n|} = k. 


eo 


OALEL 
converges absolutely in I7. Similarly, for any 0’ € L, 


ev?) — opp =e joo atts 


LIALEL; 
£40 


converges absolutely in IT + ¢’ for the same reason, showing that the only pole 
of oy in IT + &’ is a double pole at &. 


The absolute convergence of oz, (z) means that Equation (6.1) can be dif- 
ferentiated term by term (see Exercise 6.2 below) to give 


pil) = 2 ae (6.2) 


LeL 


which also converges absolutely. It is clear that o',(z) is periodic with respect 
to L since if £9 € L 


1 
o',(z+ £9) = pa <x 53 = aes 


is just a rearrangement of the terms. 

Our ultimate goal is to prove that gz, is periodic with respect to L. Peri- 
odicity of o/, does not itself imply this, of course, but a simple argument does 
allow us to deduce it. 
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Exercise 6.2. (a) Let f(z) = S079 ¢nz” be a complex power series with 
radius of convergence R > 0. Prove that f is differentiable (see Definition 8.18 
on p. 170) on the set {z € C | |z| < R} and that f’(z) = 0°, nenz”+ on 
this set. 

(b) Show how to use this to justify the expression Equation (6.2) by using 
the absolute convergence of the series defining gz to show that it may be 
expanded as a power series. 


What we have done up to now might seem clumsy: Given a series whose 
terms are clearly differentiable, the most natural way to show it is differen- 
tiable is surely to differentiate term by term. This is a reasonable criticism, 
however it involves a more subtle notion of convergence called uniform con- 
vergence (see Section 8.5). Term-by-term differentiability is easily provable 
for power series, whose terms are simply monomials, but can be much trickier 
when the terms are more complicated functions. This alternative approach 
to the analyticity of gz is given in Exercise 8.20 on p. 173, using the con- 
cept of uniform convergence, once we have had time to introduce the concept 
properly. 


Lemma 6.2. The Weierstrass g-function gr, is periodic with respect to L. 
PrRooF. We want to prove that 


pi(z+ wi) = pr(z+ we) = gx(z) 
for all z ¢ L. First, notice that by Equation (6.1) and Equation (6.2), 


pr(—2) = gx(2) and py (—z) = —91 (2). 
That is, oz(z) is an even function and g/,(z) is an odd function. Now fix i to 
be 1 or 2 and let 
f(z) = pr(z+ wi) — ex (2). 
Then f is differentiable for all z ¢ L. Since g,(z) is periodic with respect 
to L, we deduce that f’(z) = 0 for all z € C\L, so f is constant on this open 
connected set. 
To determine the constant value of f let z = —w;/2. Then 


f(—wi/2) = er (wi/2) — er (—wi/2), 


which shows that f(—w;/2) = 0 since gz is an even function. It follows that f 
must be zero everywhere, showing that gz is periodic with respect to L. 


Definition 6.3. An elliptic function is a meromorphic function C > C that 
is periodic with respect to a lattice L. If L = Zw, + Zw, then w, and wz are 
known as periods. With respect to a chosen basis {w1,w2}, the domain 


IT = {riwi + rewe | O0<ri,re< 1} 


for the lattice L is the fundamental domain. 
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Lemma 6.4. An elliptic function with no poles in its fundamental domain is 
constant. Let ITg = 3+ II be the fundamental domain translated by B € C, 
and let f denote an elliptic function with no zeros or poles on the boundary 
of IIg. If the zeros of f in IIg have orders m; and the poles have orders nj, 


then )>m; = do n;. 


ProoF. The first statement is clear: Any such function would be bounded 
on IT, and therefore on all of C, by periodicity, so it is a bounded entire 
function and therefore must be constant by Liouville’s Theorem. 

For the second statement, first notice that 


f(z) dz=0 


Ig 


since f has the same values on opposite sides of [7g, while dz changes sign. The 
result now follows by applying this to the elliptic function g(z) = f’(z)/f(<). 
Near a zero zo of order m for f, g has a simple pole with residue m (that 
is, g(z) behaves like re near 29). Near a pole 2 of order n for f, g has a 
simple pole with residue —n (that is, g(z) behaves like -—*_). 


2-2 
Cauchy’s Residue Theorem gives the result. : 


6.2 Parametrizing an Elliptic Curve 


Lemma 6.4 will be used to prove the main result of this section: The values 
of gz(z) and g),(z), for z lying in the fundamental domain, parametrize a 
complex elliptic curve. Before stating this important result, we return to the 
question raised at the end of Section 5.1: What is the identity element for the 
binary operation on an elliptic curve? 

In order to answer this, we need to come clean about elliptic curves. The 
discussion in Section 5.1 concerned the set of solutions to an equation y? = 
x + ax? + ba +c in R?; these are just an affine part of the real points of the 
curve. A complex elliptic curve is really the set of complex points in projective 
space satisfying the projectivized version of the equation. The vague notion 
of adding a point ‘at infinity’ can be made precise by studying elliptic curves 
in this more natural setting of projective space. 

Two-dimensional projective space P?(C) is defined to be the set of equiv- 
alence classes 


P?(C) = {(z0, 21, 22) € C? | (20, 21, 22) # (0,0,0)}/ ~, 
where (Zo, 21, 22) ~ (26, 24, 24) if there is a constant \ 4 0 with 
(205-24; 8) = (A293 Axis Az): 


An element of P?(C) is then an equivalence class, and we write 
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[20, 21, 22] = {(20; 215 22) | (20, 21, 29) he (20; 21, 22) } 


for the equivalence class containing (Zo, 21, 22). 
The complex elliptic curve E(C) associated with the equation 


E:y =a? +az7 +be+e 
is the subset of P?(C) defined by 


E(C) = {[z0, 21, 22] | z?z2 = ae | aze ze + bzgz3 + cz}. 


Notice that this curve contains two parts. If z2 # 0, then we can assume 
without loss of generality that z2 = 1, so all the points [zo, 21,1] with 


a =z taztbete 

lie on EF. This is the complex affine part of the curve. There is exactly one point 
with z2 = 0 (if zg = 0, then zo = 0 so z,; must be nonzero), namely [0, 1,0]. 
This point is the “point at infinity” on the curve. 

We will write 
E:y=a22+aa*+brt+e 

for the complex projective curve, suppressing the third variable (because it 
only contributes one point to the curve). We will always assume that the right- 
hand side has no repeated zeros. (See Exercise 2.14 for a simple formulation 
of this condition in the case a = 0.) 

It will be useful to talk about the K-points of an elliptic curve for other 
fields K. The curve E : y? = a?+ax?+ba+c is said to be defined over a field L 
if the coefficients a,b,c come from L. For any field K containing L, the K- 
points of the curve, E(IK), are the points in E’ whose projective coordinates 
can be chosen in K. Thus E(C) is the complex projective curve. The following 
is a major result and most of this section will be devoted to the proof. 


Theorem 6.5. Let L C C denote a lattice with fundamental domain IT. 


(1) There are constants a = a(L) and b = W(L) with 4a3 + 27b? 4 0 such that, 
for all z € C\L, 


7@L(2)” = ex (2)? + agz(z) +b. 
(2) For z € C/L, the map mr: IT + P?(Q) defined by 7(0) = [0,1,0] and 
T(z) = [ox (z), $0, (2), 1], z#0, 


defines a bijection between IT and the set of complex projective points on 
the elliptic curve E: y? = 22 +ax+b. 
(3) Suppose z1, 22,23 € IT have images m(z;) = P;,t = 1,2,3. Then 


Zp + 22+ 23 =0 


in IT if and only if P,, Po, and P3 lie on a straight line. 
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The last part of the theorem is the long-awaited justification that the 
operation defined on the points of an elliptic curve in Chapter 5 is a group 
operation. Under the bijection 


m:2z—[oxr(z), $0,(z), 1], z#0, 


the fact that the point 0 = z € C/L corresponds to the point at infinity 
relates the geometrical idea of infinity on the projective curve to the analytic 
idea that or (z) —> co as z —> 0. This is important if we work with the 
projective curve because the set of projective points forms a group with the 
point at infinity as the identity. Notice that this arises simply by transporting 
the group structure of C/L to the curve E. Theorem 6.5(3) says that the 
familiar addition in C is related, via the transcendental functions oz and 9, 
to the geometric addition on the projective curve. This transport of structure 
from the additive group C to the curve proves that the geometric binary 
operation on the projective curve really does satisfy the group axioms. Now 
the ‘Lefschetz principle’ (see the footnote on p. 108) shows that this result 
over C extends to verify the group law for elliptic curves over arbitrary fields 
in characteristic not equal to 2 or 3. 


Exercise 6.3. Show that 


@',(w1/2) = 97 (w2/2) = pL ((w1 + w2)/2) = 0. 
Show that there are no other solutions of /,(z) = 0 with z € I. 


Exercise 6.3 identifies the 2-torsion points on the elliptic curve with refer- 
ence to the lattice L. The complex torsion on an elliptic curve can easily be 
described. We will take a brief interlude to apply Theorem 6.5 to the study 
of the complex torsion points on an elliptic curve. The proof of Theorem 6.5 
will follow in Section 6.4. 


6.3 Complex Torsion 


Theorem 6.5 allows the torsion points on an elliptic curve to be understood in 
a way that is analogous to our understanding of torsion points on the circle: 
Since e: R- S! has kernel 27Z, it induces an isomorphism 


e: R/2nZ — S'. 


The distinct points of order dividing n in the additive group R/27Z are those 
of the form kL +27Z for 7 = 0,1,...,2—1. We deduce that the points of order 
dividing n in S! are those of the form e(27j/n) = e?74/” for 7 =0,1,...,n—1. 

It is not difficult to find the points of order dividing n on St. Theorem 6.5 
repeats the trick for the problem of finding all points of order dividing n for 
the group operation on a complex elliptic curve. Given 1 <n EN, the points 
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2 = (rw, + rewe)/n for 0 <ri,ra <n 


all have nz = 0 modulo L, and these are the n? points with order dividing n 
in the group C/L. These are torsion points on the complex curve. Deciding 
which of these points correspond to rational torsion points on the curve is a 
different and difficult question. 


Exercise 6.4. Let E;,(C) for n € N denote the subgroup of points on a com- 
plex elliptic curve E' whose order divides n. Show that 


E,(C) & Z/nZ @ Z/nZ. 


6.4 Partial Proof of Theorem 6.5 


We are not going to prove all of Theorem 6.5; in particular we will not prove 
that the quantity 4a? + 276? is not zero. A complete account may be found 
in the references. What we will show is how the important equation in Theo- 
rem 6.5(1) arises. 

PROOF OF THEOREM 6.5(1). Assume first that z has |z| < |é¢| for all 
nonzero € € L. Then the Taylor expansion about z = 0 gives 


1 Q2z 322 = Az 
—2 = 1 1 ae 
(2/07? = 


1 HS =: 
(2-0? 2 Pe 


By absolute convergence of the series defining o,(z), we can rearrange the 


terms in ; ‘ 
1 2z 32 4z 
pry Sa el ga ae ge 
Zz L L L 
OAlEL 


to get 
2, s. 3 as 4 a 5 


The ye indicates that the sum is over the nonzero lattice points @ € L only. For 
any n € N, the terms of the form @~ "+ as @ runs through the nonzero terms 
of L cancel out in pairs: (—@)-?”-1 = —€-?"-1, It follows that )~’e-2"-1 = 9, 
so the Laurent expansion of 9,(z) about z = 0 looks like 


1 
pi(2) = = +32°Ga(L) +52°G6(L) +--, (6.3) 


where ; 
Gan(L) = $0 o-", Len eN. 


This expression agrees with the classical result that even meromorphic func- 
tions only have even powers in their Laurent expansion at 0. 
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Consider the function 
g(z) = (2)? — 41 (z)? + 60G4(L) 91 (z) + 140G6(L). 


This function is analytic on I7, moreover g is periodic with respect to DL 
because it is an algebraic expression in periodic functions. Finally, it can be 
checked that the Laurent expansion of g(z) contains only positive powers of z. 
By Lemma 6.4, g must be a constant. Setting z = 0 shows that this constant 
value must be zero, so g is the zero function and hence the equation stated in 
the theorem holds (after dividing by 4). 

Notice that a = —15G4(L) and b = —35G(L). 


Notice that Theorem 6.5(1) is a statement about all z € C\L. In the proof, 
we have assumed that |z| < |é¢| for all nonzero lattice points. This means in 
particular that the proof is valid for all points in the region — (42) + I; 
it follows for all z € C\L by periodicity. 


Exercise 6.5. (a) Let L = (1,2). Show that the corresponding curve E';, has 
equation y? = x° + ax for some a € R. 

(b) Let L = (1,w), where w denotes a cube root of unity. Show that the 
corresponding elliptic curve Ez, has equation y? = x? + b for some bE R. 


PROOF OF THEOREM 6.5(2). We show that the map is a bijection, beginning 
with surjectivity. Suppose a € C is given. The function g,(z) — a has two 
poles (actually one double pole) in IT so, by Lemma 6.4, it must have two 
zeros. To prove injectivity (which appears to be threatened by the existence 
of the two zeros) note that the two zeros are negatives of each other. This is 
because, for z ¢ L, p_(—z) = ox (z). However, o,(—z) = —,(z). Thus, the 
images of z and —z will (usually) be distinct points on the curve, the only 
counterexamples arising when g/,(z) = 0. By Exercise 6.3, this happens for 
only three values of z, namely w,/2, w2/2, and (w,+we)/2, but this is exactly 
when z and —z define the same element of C/L. 


Finally, we show how an argument using complex analysis gives the third 
part of Theorem 6.5. 


PROOF OF THEOREM 6.5(3). Let the equation of the line containing the 
points P,; and P; be y= mz + b. Consider the function 


f(z) = @1(2) — mez(z) — b. 


This has three poles in IT (actually one triple pole) so, by Lemma 6.4, it has 
three zeros. Two of these are z; and 29; let z3 denote the third. Then P,, Po, 
and P3 lie on the line y = mz + b and (3) is seen by integrating the func- 
tion h(z) = zf’(z)/f(z) over a displaced parallelogram Hg = G+ I, where 3 
is chosen so that h has no singularities on the boundary Ig of Ig shown in 
Figure 6.3. 
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B+ wy + we 
B+ 


B+ wo 
B 


Figure 6.3. Integrating along the four sides of Ig. 


The main part of the proof is to show that z, + zg + z3 € L. By Cauchy’s 


Residue Theorem, 
1 


aod h(z) dz = 21 + 22 + 23, (6.4) 
2mi Jr, 


because fh has a simple pole at each z; with residue z;. Now break the integral 
in Equation (6.4) into two parts corresponding to pairs of opposite sides in Ij: 


1 1 B+wr B+we 
— h(z) dz = =~ i: h(z) dz +f h(z) dz 
2mi Jr, 2m \ Js Bitar eis 


1 B+wi+we B 
+—— | h(z) dz +f h(z) dz 
ami Bw B+we 


=h+I. 


Substitute z = w+ we in the second integral of [;, and use the periodicity 
of f to obtain 


if pee ef'@) Phe (tw F®) 4, 
mi (/ fe ef, 1) i) 
) 


B+w1 F( 


W2 z 


Oni B F(z) 


Now make the substitution u = f(z) to deduce that 


We 1 
==] —- du, 
271 Ja u 


where 2 is the image of the line joining @ to G+w in the variable u. Periodicity 
with respect to LD means that (2 is a closed curve, so we finally obtain 
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= a fj dua mun € Zion, 
where the integer m is the winding number, counting the number of times 2 
winds around zero. 
A similar argument for the two other sides of Ig shows that [2 = nw, for 
some n € Z. Thus 
Zp 4+ 2+ 23 = nw, + mu € L. 


Exercise 6.6. (a) Prove that, for any lattice L C C, 
Gg(L) = 2G4(L). 


(b) More generally, prove that all the G; (¢ > 8) can be expressed as polyno- 
mials in G4 and Gg with rational coefficients. 


Exercise 6.7. (a) Given any nonzero c € C, consider the map L > cL = L’. 
Let Ez; and Ez, denote the corresponding elliptic curves. Prove that the map 
defines a group isomorphism between F’,(C) and Ez/(C). 

(b) Prove that the map in (a) has the following effect upon the coordinates 
of the corresponding curves. If y? = 2° + az + b is the equation defining E, 
and y? = #3 +a'x+b’ is the equation defining Ez, show that the effect of the 
map in (a) is to take (x,y) to (c~2x,c-3y). (Hint: Recall the definition of a 
and b from Theorem 6.5(1).) 


Exercise 6.8. (a) Show that, for any lattice L and c € C*, 
G4(cL) = c *G4(L) and Ge(cL) = c ®Ge(L). 


(b) Prove that any elliptic curve y? = x? + ax +b with ab = 0 is parametrized 
by the Weierstrass o-function for some lattice LD. 


NOTES TO CHAPTER 6: The Lefschetz principle is discussed in Silverman [139, Sec- 
tion VI.6]. Theorem 6.5 is also proved in [139] along with a converse result: Given a 
and b with 4a® + 27b° # 0, there exists a lattice L such that gx(z) and $¢/,(z) 
parametrize the elliptic curve with equation y? = 2° + ax + b. For an explanation 
of the remarkable phenomenon described in Exercise 6.6(b), consult Koblitz [89]. A 
classical treatment of elliptic functions from the analytic viewpoint is contained in 
Whittaker and Watson [160]; there are sophisticated accounts of elliptic functions 
and their role in number theory in the books of Apostol [5], Chandrasekharan [29], 
Lang [95] and Weil [159]. 
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Heights 


In this chapter we introduce a way to measure the arithmetic complexity of 
points on elliptic curves. This measurement of the height turns out to be an 
essential ingredient in understanding the structure of the rational points on 
an elliptic curve. Our understanding of heights will be a key ingredient in the 
proof of Mordell’s Theorem in Section 7.2. 


7.1 Heights on Elliptic Curves 


Given a rational affine point P = (%, *), where M and N are coprime integers, 


define the naive height of P to be 


_ { max{|M|,|N]} if % 40, 
HP) = {3 if M =0. 


Write «(P) and y(P) for the coordinates of an affine point P = (a(P), y(P)). 
Define the logarithmic height to be h(P) = log H(P). 

The definition of the complex projective plane P?(C) on p. 126 extends to 
higher dimensions: For any field K, projective N-space over K is defined by 


PN (K) = {(ao,...,¢w) | (to,..., an) 4 (0,...,0)}/~, 
where (xo,...,@n) ~ (x,.--, 27) if there is a constant A € K* with 
(20,+-+;2N) SAM ie ait) 


As before, we write [%o,..., ZN] for the equivalence class (or point in projective 
space) containing the aiihc point (ao,...,@y). 

The naive height extends to projective space P’(Q). Given a point [y] 
in P' (Q), choose x = (xo,...,2n) € ZN* in such a way that [y] = [x] and 


gcd(ap,...,un) =1. 
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Then the projective height 
H:PX(Q)>R 


is defined by 
A([x]) = max, {|@il}- 


4=0,.%:; 


Notice that this is compatible with the naive height in the following sense: 
If P = (a, y) is a point on the affine piece of E(Q), then H(P) = H({x,1)), 
where [z, 1] € P'(Q). 

The logarithmic quantity h(P) is a simple example of a Mahler measure: 


log max{|M],|N|} = m(Na — M) 


(see p. 150). 

Examples 5.1 and 5.10 suggest that the number of decimal digits in the nu- 
merator and the denominator roughly quadruples each time a point is doubled. 
This is a manifestation of a general phenomenon, the duplication formula. 


Theorem 7.1. [DUPLICATION FORMULA] Let E denote an elliptic curve de- 
fined over the rationals, and let P be a point in E(Q). Then 


h(2P) = 4h(P) + O(1), (7.1) 
where the implied constant in O depends on E but not on the point P. 


This will be proved on p. 137 after some more machinery has been devel- 
oped. 
In multiplicative notation, the duplication formula may be written 


H(P)* < H(2P) « H(P)*. 


Example 7.2. Consider the curve E : y? = x? —n?x with 1 <n €N. Let P be 
a rational point on FE. A calculation gives 


van (x? + n?)? 


so if «(P) = a in lowest terms, then 


= (M? ae n?N?)? 
2(2P) = CM? BN)" (7.2) 


It may be checked that any cancellation in Equation (7.2) is bounded: Explic- 
itly, if d divides both numerator and denominator, then d|16n°. Examining 
the cases |M| > |N| and |M| < |N| separately shows that 


max{|M? + n?.N?|?,|N?(M?2 — n?.N?)|} 
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is commensurate! with 
max{|M|*,|N|*} = max{|M|,|N|}", 
and the duplication formula Equation (7.1) follows. 
Exercise 7.1. Verify Theorem 7.1 for the curve y? = «3+ C, C £0. 


The duplication formula is a special case of a general principle about poly- 
nomial maps on projective space, and so we prove Theorem 7.1 in greater 
generality. 

A polynomial f in N variables is called homogenous if there is a con- 
stant d € N (the degree of f) with 


Fs stay AEN) =A" Toys ten): 


Exercise 7.2. Let fo,..., fas be polynomials in N + 1 variables. Show that 
the map x +> (fo(x),..-, fir(x)) between KN*+! and K“+? induces a well- 
defined map P* (IK) + P™ (KR) if and only if the polynomials fo,..., fy are all 
homogenous of the same degree and the only common zero of the polynomials 
is the point (0,...,0). 


Definition 7.3. A map 
f:PXQ — PQ) 
is called a morphism of degree d if 


F(Ex]) = f([20,---,@n]) = [fo([x]),--- + frr(Ex))I, 


where the f;, 0 <7 < M are homogenous polynomials of degree d with the 
property that the only common zero is 0. 


Lemma 7.4. Let f : PN(Q) + P™(Q) be a morphism of degree d. Then 
H([x})* < H(f([x])) < H([x})*. 


PRooF. Write f([x]) = [fo(x),.--, far(x)], where [x] = [70,...,2w] € PX (Q). 
By clearing denominators, we may assume that each x; is an integer. Since 
each f; is homogenous of degree d, they may be written 


file) =D cog? 0H, 


with ce € Q, e, EN, eg +--+: + en = d, and only finitely many ce nonzero. It 
follows that there is a constant C' such that 


' In the sense that the ratio is bounded above and below by positive constants 
independent of N and M. 


136 7 Heights 
| fi(x)| < C- (max{|25|})%, 


for each 7 and all j, so there is a similar bound for max{|f;(x)|}. To find the 
height, notice that the only possible denominators that need to be cleared 
come from the coefficients of the polynomials f;, which is a bounded quantity 
in total. It follows that there is an upper bound for the height of the form 


C+ H(x)¢. 


To get the lower bound, we use Hilbert’s Nullstellensatz: There exists e € N 
and polynomials g;; € Q[x] such that 


© = goo(x) fo(x) +--- gon (X) fn (x) 


1 = gol) fol) +--+ + gr (0) Fv). 
The g;;8 can be taken to be homogenous polynomials of degree (e — d) so 
Igij (x)| < (max {|r| })°*. 
On the other hand, 
© = 9j0(X) fo(x) +--+ + gyn (x) fn (x) 
for 7 = 0,...,N, so 
(max{|x;|})°"“ max fol, --, | fv|} > (max{|2,|})°. 


It follows that 
max{|fo|,...,|fv|} >> (max{|a,|})%, 


and since the only possible denominators are those arising from the coefficients 
of the f;, the lower bound is proved. 


Example 7.5. To see that e > d really occurs in the Nullstellensatz, define 
f:P'(Q -P'(Q) 


by 

f : [xo, #1] > [25, (to + £1)7] = [fo(xo, 21), fa(#o, 21)]- 
Then f is a morphism of degree 2. Now x2 = 1- fo, but there are no rational 
polynomials A, B for which x7 = A- fo + B- fi. However, 


xv = 20° fo 
a} = (2x9 + 321) - fo + (—220 + 21): fi 
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Exercise 7.3. (a) Using the explicit formulas from Example 7.2, prove that 
the map defined by [z(P),1] + [x(2P),1] is a morphism of degree 4 for the 
curve y? = 2° — n?z. 

(b) Do the same for the curve y? = x? +c. 


PROOF OF THEOREM 7.1. By Lemma 7.4, all we need to show is that the 
map 
[x, I} + [x(2P), 1] 
on P'(Q) is a morphism of degree 4. Assume that the curve is 
E:y so? +an+b 
and P = (x,y). Then 


2 2 
Zope PE) a r55 
2y 
2 2 
= (Ba* +a)" — 7 
Ay? 
9x4 + 6x72a + a? 
= 22 
A(x? + ax + b) 
x* — 2x7a — 8xb+ a? 


4(a3 + ax + b) 


Write x = = € Q (in lowest terms as usual). Then, writing «(2P) = eer 


and dropping a factor of 4, 
fo(%o,%1) = 26 — 2azata — 8x orb + a%at, and 
Fx(@o,@1) = hx, + axgay + bry. 
To show that these define a morphism of degree 4, it only remains to show 
that the unique common zero of fp and f; is (0,0). If fo = fi =0 and x; = 0, 
then 29 = 0. Assume x; 4 0. Then we may assume that x; = 1 and ao = x. 
We now need to show that 
f(z) = 27 —227a—82b+a7, and 
g(x) = x +ax+b 
cannot have a common zero. One way to see this is using resultants (see 
Exercise 7.4); we will use the Euclidean Algorithm (see Example 2.3(2)) to 


find the greatest common divisor of f and g. Assume first that a 4 0 and 
recall we are assuming that 4a° + 27b? 4 0. The Euclidean Algorithm gives 


x —2x7a—82b+a? = (a°+ar+b) x—3ax"—9br+a?; 


1 b 9b? 4 
x +az 4 v= 3ax? abe +) ( atta) | (S | 5a) 


4B 9a3 27ba2 
—3ar?—9bx+a? = +9 ba’ 
ee Sera ((F« z)*)( 403 +2762 saan) re 
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which shows that the greatest common divisor of f and g is a nonzero constant. 
If a=0 then b £ 0 since 4a? + 270? ¥ 0, so the Euclidean Algorithm gives 


a — 8xb = (x3 + b)a — 9xb; 
(x* + b) = (—9zb) (— gz) +, 


which, again, shows the greatest common divisor of f and g is a nonzero 
constant. 


Exercise 7.4. Show that the resultant of the polynomials 
f(z) = 2* — 2x7a — 82b+4+ a? and g(x) = 22 +axr4+6 


is (4a? + 2767). This shows that the condition 4a? + 27b? 4 0 implies that f 
and g have no common zero. 


Exercise 7.5. Let E : y? = 2° + ax+b with a,b € Z denote an elliptic 
curve. Using arguments from p-adic analysis, it can be shown that any nonzero 
torsion point Q € E(Q) must have x(Q) and y(Q) integral. Assuming this, 
prove that y(Q) = 0 or y(Q)? divides 4a* + 27b? for any rational torsion point. 


Exercise 7.6. Recall from Exercise 5.12 that the point P = (3,8) has order 7 
on the elliptic curve 
y? = «a? — 432 + 166. 


Using Exercise 7.5, show that there are no rational torsion points other than 
those in the subgroup generated by P. 


7.2 Mordell’s Theorem 


In this section, we will see how Mordell’s Theorem follows from the weak 
Mordell Theorem. In the next section, we will give a proof of the weak Mordell 
Theorem for the congruent number curve and discuss how the proof can be 
extended to cover a wider class of curves. The proof in full generality requires 
more algebraic number theory than we have at our disposal. Complete proofs 
may be found in the references discussed at the end of the chapter. 


Theorem 7.6. [WEAK MORDELL THEOREM] Let E denote an elliptic curve 
defined over Q. Then E(Q)/2E(Q) is a finite Abelian group. 


Lemma 7.7. Let E : y? = «3? + ax +b be an elliptic curve defined over the 
rationals. 


(1) If Po £0 is a point in E(Q), then there is a constant cy = ci(E, Po) > 0 
such that 
h(P + Po) < 2h(P) +c1. (7.3) 
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(2) Given ho > 0, there are only finitely many points P © E(Q) with 
h(P) < ho. 


PROOF. (2) is clear since only finitely many rationals ™ (in lowest terms) 


have log max{|m|, |n|} < ho. 
To prove (1), write P = (x,y) and Po = (20, yo). From the equation 


y =x? +ar+b, 


write (in lowest terms) 


with r,s,t,79, 8 ,tg all integers. Then 


2 
a(P + Py) = (4) 2-2 


Ty x 


2_9 a 2 
_ Yo= 4Yoy FY (tp + 2) (a2 — 2x pe + 2”) 


(xo — x)? (Zo — x)? 
7 retary +b+a23 + ax + b— yoy 
(tp — 2)? 
(xp — Qaex + xu? + vx2 — 2xyx* + x?) 
(xp — 2)? 
_ a(t + ©) + 2b — 2yoy — ( ree — x72) 
(tp — 2)? 
_ aly + &) + 2b — 2ygy + Lpx(Xp + 2) 
(ap — 2)? 

(a+ ox) (%q + ©) + 2b — 2ygy 

(xp — 2)? , 


Substituting r,s,¢ then gives 


a+ wr) (+ fr) +2b— 23% 


a(P + Po) = ( 


t 
(at2t? + ror) (rot? + rt2) + 2bt*t§ — 2ssotty 
(rot? = rtg)? , 


The effect of clearing denominators in the rationals a and b appearing as 
coefficients in the elliptic curve can be absorbed into the constant c ,. It is 
therefore sufficient to check that the numerator and denominator satisfy the 
inequality in Equation (7.3). 
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First 
|numerator| < (c2|t|? + c3|r|)(ea|t|? + e5|r]) +e6|t|4 + e7|st]. (7.4) 
<es(max{|r|,|¢|?})? 
The first two terms have 
ca(max{|r|, |¢/?})? + e6lt|* < coH(P)’. 
Looking at the third term, we need to show that 
|st| < cio H(P)?. 
Since y? = 2° + ax +b, we have 
(st)? = r3t? + art® + be. 


There are two cases to consider. 
I: |r| > |t|?. In this case 


|st|? < cy|r|4 + era|r|* + cagir|* = cra H(P)*. 
II: |r| < |¢|?. In this case 
|st|? < crs|t|® + creltl® + crzlt|® = cis (P)*. 
In both cases, |st| < cioH(P)?, as required. Therefore 
|numerator| < cigH(P)?, 
or in logarithmic form 
log |numerator| < C29 + 2h(P). 


The denominator is simpler: 


|\denominator| = |rot? — rté|? 
2 
<< (coi |t|? + c22|r|) 
< c23H(P)*, 
so 
log |denominator| < c24 + 2h(P). 
PROOF OF THEOREM 5.11 ASSUMING THEOREM 7.6. Let O = {Qj,...,Q<.} 


denote a fixed set of coset representatives for 2E(Q) in E(Q). By Theorem 7.6, 
it is enough to prove the following: There is a constant R = R(E,Q) with 
the property that every point P € E(Q) can be expressed as an integral 
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combination of the Q;, 7 = 1,...,s and those Q € E(Q) with h(Q) < R 
(finite in number by Lemma 7.7(2)). 


Let P be a rational point on EF, and write (for some i, 71,--- € {1,...,s}) 
P=Pyp = Qi, + 2P1 
Pi = Qi, + 2P2 


P, = Qi, + 2Pn41. 
The duplication formula Equation (7.1) says that 


4h( P41) Cp h(2Pr+1) (7.5) 
for some cy = ci(F) > 0. On the other hand, Lemma 7.7(1) shows that 
h(2Pn41) = h(Pa — Qi,,) < 2h(Pp) + e2 (7.6) 


for some C2 = C2(F, Q). Combining Equation (7.5) and Equation (7.6) gives 
1 
h( P41) < gi(Pn) + C3 


for some cz = c3(F, Q). Iterating this gives 


1 


1 
h(Pr+1) < 2 (ShPoa) +e] + c3 


1 1 
= xo MPr-1) + ¢3 (1 + 5) 


1 1 1 1 
< anti h(Po) + C3 (: t 5 t y ee an ) ‘ 
As n — oo, for fixed Pp, 
1 1 1 1 


Now P=Q,, +2Pi, Pi = Qi, + 2P2 and so on gives 
P=Qis +2(Qiy + 2Ps) 
= Qiy + 2Qi, + 2°P2 
= Qin + 2Qi, + 2?Qi, + 2°P3 


so P is being written as an integer combination of elements of Q and a point 
whose height is uniformly bounded as n — co by Equation (7.7). 

Take R = (2+ 34)c3, a constant depending on E and Q. We have shown 
that any rational point P can be written as an integral combination of the 
points of Q and a point with height bounded by R. 
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7.3 The Weak Mordell Theorem: Congruent 
Number Curve 


We will now give a proof of Theorem 7.6 for the congruent number curve 


y2 = 28 — nz, 
where n > 0 denotes an integer. The proof uses a homomorphism from E(Q) 
to a quotient group of the group of nonzero rationals and we begin my intro- 
ducing this group. Let Q denote Q* /Q*?, which is the quotient of the group of 
nonzero rationals by the subgroup of all nonzero squares. The representatives 
for this group can be taken to be all nonzero integers r which are not divisible 
by the square of a prime. We will write 7 for the coset containing r. Notice 
that the identity of the group is T and the element —1 is an element of order 2 
in Q. In this section, the point (0,0) will play a distinguished role and will be 
denoted T = (0,0). 


Lemma 7.8. Define a map ¢, : E(Q) > Q by 


¢1(0) 
o1(T) 
oi((x, y)) = & otherwise. 


=5 
= 


Then $1 is a group homomorphism. 


This is a remarkable claim. If you try to prove it simply using the addition 
formula it can be difficult to dig out, and might even start to look impossible. 
We will use a simple trick already encountered to make it come out quite 
smoothly. The reason ¢ in the definition carries the suffix 1 is because we will 
define two other similar maps shortly. 


PROOF OF LEMMA 7.8. Let P, and P2 denote rational points with 
Pi + Po = P3. 


We wish to deduce that ¢)(P3) = $1(P,)¢1(P2). There are a number of special 
cases to be considered before we can deal with the general situation. The only 
nontrivial special case which requires any work arises when one of P; or P2 is 
the 2-torsion point T = (0,0). Say we add P = (a, y) to T, where 4 0. The 
image of the sum under ¢; is 


(y/a)? —& = (y* — a°)/2* = —n?a = —¢ = $1(P)o1(T), 


hence the result is true in this special case. An almost identical proof gives 
the case where P3 = T = (0,0). 

Recall Section 5.3, where we converted the geometric addition on an elliptic 
curve into explicit formulas. The group law on an elliptic curve tells us that 
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the points P;, P2 and — Ps lie on the same straight line. Writing P, + P2 = P3 
with P; = (x;, yi), we need to show that 1127223 is a rational square. From the 
above, we may assume each of 21,22, and x3 are nonzero rational numbers. 
Let the line containing the points P;, P, and —P3 be written y = ax + 8, 
for rationals a and @. Our assumptions guarantee that G6 4 0. Substitute the 
equation of the line into the equation of the curve to get 


x — nx — (ax + 6)? =0. 


The roots of this equation are the three rational numbers 71, 72 and x3 because 
it is this equation which defines them. Hence we can factorize the left-hand 
side as 


(a — @1)(a@ — %2)(a — a3). 


Now if we compare the two equations (see Exercise 5.11 on p. 105) we see 
that ©1223 is equal to 37, the square of a rational. In other words, up to a 
rational square 71% and 2x3 are equal; hence 


o1(Pi + P2) = 1(P1)¢1 (Pe). 


Exercise 7.7. Verify that Lemma 7.8 is true for an elliptic curve of the form 
y =x +az? + br 
with the same definition of $1. 


We have already indicated that E(Q) can be an infinite group. The second 
lemma says that even if that is true, the image of this group under ¢) is a 
finite group. 


Lemma 7.9. The image of E(Q) under ¢, is a finite subgroup of Q. 


PROOF. Suppose f lies in the image of ¢,. Without loss of generality, assume r 
is a square-free integer. We claim that r|n. To prove this, suppose p is a prime 
with pir, then we will show p|n. The statement ¢1((z,y)) = 7 amounts to two 
equations 


as =n? = rs" 
g=rt? 
for rationals s and t. Now clear denominators by writing t = a/b for coprime 
integers a and b. Eliminating x, we obtain an equation 


reat — n?2b* = rc? 
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for some integer c. If p|r but p}n then p|b and therefore p*|n?b*. Thus p? 
must divide the left-hand side and it follows that p? must divide the right- 
hand side. Since r is square-free, it follows that p|c so p?|c? and hence p* 
divides the right-hand side. This forces p® to divide r?a* so pla (since r is 
square-free). Thus p divides a and b which contradicts the assumption that 
they are coprime. 


Exercise 7.8. For the elliptic curve E defined by y? = x? — 36z, the torsion- 
free part of E(Q) is generated by the rational point (—3,9) (you may assume 
this). Find the image of E(Q) under the map 41. 


Exercise 7.9. Suppose F denotes an elliptic curve and p and q denote rational 
numbers. The map t+ «—p,y > y —q takes rational points on this curve 
to rational points on a new elliptic curve. Assume that the point at infinity 
on the first curve maps to the point at infinity on the second (this can be 
verified by taking limits as before). Show that the resulting map is a group 
isomorphism. In the language of Section 5.5, the map is an isogeny of degree 
one. 


Exercise 7.10. Define a map ¢2 : E(Q) — Q by 


Prove that ¢2 is a group homomorphism. (Hint: Compose this map with a 
suitable translation map and use Exercise 7.9.) 


In a similar vein to Exercise 7.10, we can define a map ¢3 : F(Q) > Q by 


o3((@,y)) =e+n 
whenever x # —n. 
Exercise 7.11. Show that both of the maps ¢2 and #3 have finite image in Q. 
Our goal is in sight now. Combine the three maps into one by defining 
$: E(Q) +98 


to be 
o(P) = (¢1(P), $2(P), ¢3(P))- 


Earlier on we showed that the doubling map on a rational point on the con- 
gruent number curve E : y? = x? — n?x produced an z-coordinate which is 
the square of a rational, provided the starting point does not have order 2. 
This suggests that we might find 2F(Q) inside the kernel of ¢. More is true. 
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Lemma 7.10. The kernel of ¢ is precisely 2E(Q). In other words the rational 
point P = (x,y) is the double of a rational point if and only if x andxatn 
are all rational squares. Explicitly, write 


= 2. 
L=T77;3 
+n = 13; 
T+n=79; 


r—-n=r3 for r€EQ. 
Then P = 2Q where Q = (X,Y) and X and Y are given by the formulas 


X=#4+r17rg+7r17r3 + rears, 
Y =(ritret+r3)(X — 2) —y, 


provided the signs of the r; are chosen so that ryrer3 = y. 


Example 7.11. Let n = 6. The point Q = (—3,9) doubles to the point 
on the curve y* = x? — 36z. This is verified by taking r) = 3,r2 = — 


As expected, 
20. 60 eo Ue 


X= : : — 
4 22729 22 


3, 
and similarly Y = 9. 


The proof of Lemma 7.10 is purely computational and we leave the verifi- 
cation as an exercise. The burden of explanation rests on the question of why 
it should be true in the first place. In one sense it is not wrong to say it comes 
down to Mordell’s genius. The notes at the end of the chapter include a useful 
reference which suggests how Mordell might have come upon this remarkable 
phenomenon. 


Exercise 7.12. Suppose £ is an elliptic curve defined by the equation 
E:y=a22+ar*+brt+e 


where a,b and c are rational. Assuming the roots of the cubic are all rational, 
adapt the proof above to deduce the weak Mordell Theorem for E. 


In the general case, the technicalities of the proof are no greater from the 
point of view of elliptic curves. What is required is a deeper knowledge of 
algebraic number fields. 

During this section, we have seen how homomorphisms between elliptic 
curves, or homomorphisms from elliptic curves to other groups, played an 
important role. Although we will not develop this any further, it is worth being 
aware of the importance of the map which reduces modulo p, for a prime p. 
This map takes an elliptic curve defined over Q to one defined over F,. Since 
all the group operations are defined by rational functions, we should not be 
surprised that the map is a group homomorphism (though this does of course 
require that the reduced curve is really an elliptic curve.) More remarkably, the 
notion of “infinity” as the identity of the group is quite robust. The following 
exercise gives an opportunity to encounter this phenomenon. 
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Exercise 7.13. Suppose FE is an elliptic curve defined by the equation 
E:ya=s2?+oar+d, a,be€Z. 


Let p denote a prime number coprime to 4a? + 27b? and let E1(Q) denote 
the set of rational points (x,y) on the curve with the property that the de- 
nominators of # and y are divisible by p together with the point at infinity. 
Prove that £;(Q) is a subgroup of E(Q). (Hint: Resist the temptation to do 
this using the functions defining addition. What is the kernel of the reduction 
map?) 


7.3.1 The Generation Game 


We have seen some examples of elliptic curves with rank 1; for example the 
curve given by the equation y? = «° — 2, with the generator (3,5), also the 
congruent number curve for n = 6 which is generated by (—3,9). It is natural 
to ask how the rank can be proved to be 1 and how these can be proved to be 
the generators. Although many special cases have been worked out, in general 
there is no algorithm known for determining the rank of an elliptic curve nor 
for finding a set of generators. In the notes at the end of the chapter, several 
of the references provide details about how special cases can be approached, 
as well as links to massive tables of curves whose ranks have been computed, 
along with systems of generators. We recommend as a worthwhile exercise, 
doing some computations with some of these curves using a computer algebra 
package. 


7.4 The Parallelogram Law and the Canonical Height 


The duplication formula Equation (7.1) says that for any P € E(Q) 
h(2P) = 4h(P) + O(1), 
or, equivalently, there is a constant c = c(E) such that 
|h(P) — $h(2P)| <e. (7.8) 


The next result exploits this to produce a height function with better functo- 
rial properties, the canonical height. The approach below is due to Tate; the 
canonical height was discovered independently by Neron. 


Theorem 7.12. For any rational point P on an elliptic curve E defined over 
the rationals, 
_ h(2"P) 
lim ————~ 


noo 4n 


exists. The limit h(P) is called the canonical height of P. 


=N(P) (7.9) 


7.4 The Parallelogram Law and the Canonical Height 


Proor. Let an = pvh(2% P). If N > M > 1, then 


1 1 
am —an = qi h(2™ P) - ga h(2"P) 
i 


garb oP) 


oa 


M+1 M+2 
parr hl2MP) — aap h(2M*P) 


which may be grouped into 


1 i 
aM — On = 457 (ne™P) — Fhe. 2"P)) 


1 M+1 i! M-+1 
rr ea (nee TEP) = h(2 26 1P) 


1 2 1 zs 
aN (m2 1P)— gh2- 28 IP) 


By the duplication formula (Theorem 7.1), this gives 
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1 1 1 1 4 
jaar —anl < gyre (1+ 5+ ze +++) = Gre($) 40.08 M00, 


showing that (aj) is a Cauchy sequence. 


If the order of P is a power of 2, then h(P) = 0. In fact, any torsion 
point P has h(P) = 0, and moreover h(P) = 0 implies that P is a torsion 


point by Theorem 7.13(4). 


Theorem 7.13. Let E be an elliptic curve defined over the rationals. 


(1) For every point P € E(Q), 


uniformly. 


(2) For all P,Q € E(Q), 


A(P + Q) + h(P — Q) = 2h(P) + 2h(Q). 
(3) For everym € Z and P € E(Q), 


h(mP) = m7h(P). 


(7.10) 
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(4) For P € E(Q), 


h(P) = 0 if and only if P is a torsion point. 


This is proved below. Equation (7.10) is called the parallelogram law. It 
follows from (1) and (3) in Theorem 7.13 that 


h(mP) = m?hA(P) + O(1), 


which is a weaker version of Theorem 5.2. A more useful generalization of this 
formula is the parallelogram law for the naive height. This will be stated now, 
then Theorem 7.13 will be proved. The parallelogram law for the naive height 
will be proved in Section 7.5. 


Lemma 7.14. For all P,Q € E(Q), 
A(P+ Q)+A(P — Q) = 2h(P) + 2A(Q) + O(1) (7.11) 
uniformly. 


PROOF OF THEOREM 7.13. 
(1) By iterating the relation 


h(P) = 5 (h(2P) + 0(0)), 
we have 
n(P) = 7 (a 700) +001) 

h(22P) eal 

SE Ae Gz) 
h(2" P) Dy Mind aie al 

ve AN 0) (7 +z a) 

O(1) 


Letting N — oo gives n 
h(P) = h(P) + O(1). 


(2) Applying a similar limiting procedure to the naive parallelogram law Equa- 
tion (7.11) gives 
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n~ n~ n~ 


R(P + Q) + h(P — Q) — 2h(P) — 2h(Q) 


lim (Gener +Q)) + ah2® (P —Q)) 


N-0o 


a he" P) a whe"Q)) 


lim (4x o(1)) =0. 


N-0o 


(3) This is proved by induction on m > 1. The case m < —1 follows since 


For m = 0, h(0) = 0 = (0). Assume therefore that 
h(mP) = m7h(P), 
and substitute mP for P and P for Q in the parallelogram law Equation (7.10): 
h(mP + P) = 2h(mP) + 2h(P) — h((m—1)P) 
= 2m7h(P) + 2h(P) — (m— 1)?A(P) 
= (m+1)?A(P). 


(4) If P is a torsion point, then mP = 0 for some m 0, so by (3) h(P) = 0. 
Conversely, suppose that h(P) = 0 for some P € E(Q). Then 


h(mP) = mh(P) = 0 for all m, 


so h(mP) must be uniformly bounded for all m by (1). By Lemma 7.7(2), this 
means that the set {mP}.¢z must be finite, so P is a torsion point. 


7.4.1 A Strong Form of Siegel’s Theorem 


The result that follows we call the Strong Siegel Theorem; it was proved by 
Silverman and we will not prove it here. It relates the growth rates of the 
numerators and the denominators of the multiples nP of a nontorsion rational 
point. 


Theorem 7.15. [STRONG SIEGEL THEOREM] Let E denote an elliptic curve 
defined over Q and suppose P € E(Q) denotes a nontorsion point. Let (Pn) 
be any sequence of rational points with h(P,) 4 co as n + co, and write 


Ay. (Cy 
a eee 


log |A,,| 
2 log |Bn| 


Then 
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This can be interpreted as saying that the numerators and denominators 
of P, have roughly the same number of decimal digits for large n. Theorem 5.2 
follows from this, together with Theorem 7.13. A particular case of a rational 
sequence (P,,) with h(P,) + co is given by taking P, = nP for a rational 
nontorsion point P on an elliptic curve defined over the rationals. 


7.5 Mahler Measure and the Naive Parallelogram Law 


In proving Lemma 7.14, some simple estimates on polynomials will be needed, 
and one way to phrase them is to use the Mahler measure, which is of inde- 
pendent interest. There are several natural ways to measure the size of a 
polynomial in such a way that an integer polynomial with zeros of large arith- 
metic complexity will have large measure. 

Definition 7.16. For any nonzero polynomial 


d 
F(x) = aan? + ag_124-} + -++-+.a9 = aa] ](e— 0%) 
21 


in C[a], define three measures as follows. 


(1) The Mahler measure of F' is M(F) = |aa|- ince max{1, |a;|}. 
(2) The height of F is H(F’) = maxo<i<a{|ai|}. 
(3) The length of F is L(F) = Sx“, ail. 


In (1), an empty product is assumed to be 1, so the Mahler measure of 
the nonzero constant polynomial F(x) = ao is |ao|. Write m(F) = log M(F) 
for the logarithmic Mahler measure of F’. 

Mahler showed that 


d 
lai] < ({) un for alli=0,...,d 
i 


and also showed that all three measures are commensurate in the sense that 
H(F)< M(F) « A(F) 

and 
L(F) < M(F) « L(F), (7.12) 


with the implied constants depending only on the degree d. 
The absolute value of the discriminant of F' is defined to be 


|A(F)| = |aal?*~? | J laa — al. 
ifj 
Mahler also showed that 
|A(F)| < d¢M(F)?4-?. (7.13) 


7.5 Mahler Measure and the Naive Parallelogram Law 151 
Exercise 7.14. (a) Prove that 
—dlog2+&(F) < m(F) < e(F), 


where we write € = log L. This is equivalent to an exact description of the 
implied constants in Equation (7.12): 


2-4L(F) < M(F) < L(F). 
(b) Prove a weaker form of the inequality (7.13) as follows. Assume that 


F(x) = ae? +ag0* }++-++a9= II (x — ai) 
1<i<d 


is monic, so the absolute value of the discriminant is 


|A(F)| = [[ lai — a5. 
tAj 
Prove that 
|A(F)| < Pda AG cua 


Exercise 7.15. Fix a polynomial 


d 
F(a) _ ag + eee +:+++a9 = aa [[@ a Qj) 
i=l 


in Z[z]. Call F hyperbolic if |a;| #1 for alli =1,...,d and ergodic if a¥ = 1 
for some k > 0, and any 7 implies that k = 0. 
(a) Prove that 


m(F) = | log |F(e?"!*)| ds (7.14) 


when F is hyperbolic. 
(b) Prove Equation (7.14) without assuming that F is hyperbolic. 
(c) Prove that A,(F’) = [3 |a? —1| is an integer for all n. For F' hyperbolic, 


prove that 
An+i(F) 
é 1/n mee n+l = 
Ale Ee a 

(d) Prove that an ergodic polynomial of degree d < 3 is hyperbolic. 
(e) Find a polynomial that is ergodic but not hyperbolic. 
(f)*For F' ergodic but not hyperbolic, prove that 

lim A,(F)/" = M(F) 


n—->oco 


but that aun does not converge as n — oo. 
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Exercise 7.16. [KRONECKER’S LEMMA] Prove that a polynomial F' € Z[z] 
has m(F) = 0 if and only if every zero of F satisfies \* = 1 for some k > 1. 


Exercise 7.17. Considerable interest has been shown in the set of values of 
the Mahler measure of integer polynomials. 
(a) Compute m(F’) to 3 decimal places when 


F(a) =a +29 —¢! —2® — 2? — oto? +2 +1. (7.15) 


(b)*Explore the mathematical literature on Lehmer’s Problem: Is there an 
integer polynomial G with m(G) > 0 and with m(G) < m(F)? More generally, 
given arbitrary « > 0, is there an integer polynomial H with m(H) > 0 
and m(H) < e? Extensive calculations have been made of values of the Mahler 
measure for monic polynomials, and no nonzero value smaller than m(F’) has 
been found. 


PROOF OF LEMMA 7.14. Let E: y? = 2°+ax+b be the elliptic curve. Let P 
and Q be points in E(Q), and write z(P) = 21,2(Q) = r2,2(P + Q) = 2s, 
and «(P — Q) = 24. 

The values of x3 and x4 depend on the y coordinates of P and Q, which 
complicates the proof considerably. We will work in the coordinates 


1X2, L1+%Q, 134, and x34 La, 


because these only depend on the x coordinates. Now 


2(x1 + 22)(a+ 21242) + 4b 


3+ %L4 = (x — a9)? ) 
beta (2122 — a)" — 4b(x1 + 22) 
(x1 — £2)? 
and we may write 
(x1 — x2)? = (a1 + 22)? — 4a 29, 


giving 73 + x4 and x324 in terms of 71%, 41 + Xo. 
We claim that for any 21,22 € Q, 


A(|a1 + 22,2122, 1]) = A((z1, 1]) + A([xo, 1]) + O(2). (7.16) 


To see this, write x1; = 7,22 = % in lowest terms, and define 


Fi(z)=ta-—s, Fo(x)=va-u. 


Then 


Now by Equation (7.12) and Exercise 7.14, 
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d d 
m (> oa) =h S oa) + O(d), 
1=0 1=0 


where noes a,x’) = log max{|a;|}. Applying this to Equation (7.17) gives 
h( FF) = h(F\) + h(F2) + O11). (7.18) 


Now 
A( Fy) = A([e1,1]),  AUF2) = h([x2,1)). (7.19) 


On the other hand, 


F, (x) Fo(x) = (tx — s)(vx — u) = tux? — 2(sv + tu) + su 


so 
h(F\ F2) = max{|tv|, |sv + tul, |sul}. 

Now 

su + tu 

m+22= , and 
tu 
SU 
102 = —, 
1a 

so 


su+tu su 
tu’ tu’ 


A([a1 + @2,2%1%9,1]) =h (| i}) = h([sv + tu, su, tv]). 
Now sv + tu, su, and tv cannot have a common factor by Gauss’ Lemma, 
so h(F, Fp) = h([w1 +22, 2122, 1]), and Equations (7.18) and (7.19) give Equa- 
tion (7.16). 

Change variables and work with x1 + xq and x 129: 

2 
rg +%4= Erase) Ca 20 and 

(a1 oe x2)? as 421 2x2 
(x1 @ — a)? — 4b(x1 + 22) 

(21 os x2)? aaa 42122 


T3004 = 


Lemma 7.17. Assume that 4a? + 27b? 4 0. Then the map P?(Q) + P?(Q) 
defined by 


[u, v, t] = [2u(at + v) + 4bt?, (v — at)? — 4btu, u? — 4tv] 
is a morphism of degree 2. 


The formulas in Lemma 7.17 come from setting u = 21 +22 and v = 21% 
and using t to make the expressions homogenous. 


PROOF OF LEMMA 7.17. Suppose the three polynomials vanish: 
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2u(at + v) + 4bt? = 0, (7.20) 
(v — at)? — 4btu = 0, and (7.21) 
u? — Atv = 0. (7.22) 


If t =0, then u = 0 by Equation (7.22), so v = 0 by Equation (7.21). 
Suppose therefore that t 4 0, and divide by t? in each equation. Write 


2 UV 
—, 80% = — 
is 


by Equation (7.22). Equations (7.20) and (7.21) give 


Ni) 


(x? — a)? — 8b2 =0 and 
z(a+ 27) +b=0, 


or 


x* — 2ax” — 8br + a7 =0 and 


etar+b=0. 


These polynomials arose in the proof of the duplication formula on p. 188, 
where it was shown that they have no common zero. 


Now apply Lemma 7.17 to the vectors 
[v1 + 22,2122, 1] and [v3 + 24, 7324, 1]. 
Since the map from the first to the second is a morphism of degree 2, 
h((x3 + @4, 0324, 1]) = 2h([21 + v2, 2122, 1]) + O(1) 
by Lemma 7.4. Equation (7.16) shows that 
h([v3 + v4, 0304, 1]) = h([v3,1]) + h([wa, 1]) + O() 


and 
A([a1 + r2,@1 2, 1]) = h([x1, 1]) + A([xe, 1]) + O(1), 
sO 
h([x3, 1]) + h([xa, 1]) = 2h([x1, 1]) + 2h([z2, 1]) + O11), 


and therefore 


h(P + Q) + h(P — Q) = 2h(P) + 2h(Q) + O(1). 


NOTES TO CHAPTER 7: The polynomial in Equation (7.15) is taken from Lehmer’s 
paper [97]; a starting point for Exercises 7.15 and 7.17(b) is [59] and the references 
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therein. Hilbert’s Nullstellensatz may be found in any book on algebraic geome- 
try; an accessible account is in Reid’s notes [122]. The statement about integrality 
of torsion points in Exercise 7.5 is due to Lutz [101] and Nagell [112]. Accessible 
proofs are in Cassels [27, Chapter 12], Husemdller [79, Chapter 5, Section 6] and 
Silverman [139, Chapter VIII, Section 7]. The characterization of all torsion points 
on the curves y” = x? + ax (for a integral and not divisible by a fourth power) 
and y”? = 2° +b (for b integral and not divisible by a sixth power) is given in 
Cassels [27, Exercise to Chapter 12]. Theorem 7.6 is proved in Lang [94], and Sil- 
verman [139]; a sketch proof from an advanced point of view is in the article by 
Milne [108, Theorem 20.10]; see also Lemmermeyer’s excellent Web notes on elliptic 
curves. Cassels’ article [26] is an excellent piece of background reading, in which he 
gives a plausible explanation as to how Mordell might have discovered what became 
known as the weak Mordell Theorem. Cremona’s Web site [37] gives the rank and 
a set of generators for thousands of elliptic curves. The strong form of Siegel’s The- 
orem (Theorem 7.15) can be found in Silverman [139, Chapter IX, Section 3]. The 
Mahler measure in Section 7.5 was introduced in two papers of Mahler [103], [104]. 
There are extensive references to the many places where the Mahler measure arises 
in [59], especially connections between heights and dynamical systems. Computa- 
tional material related particularly to the Mahler measure from Section 7.5 appears 
in a book by Borwein [18]. 
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The Riemann Zeta Function 


We saw in Chapter 1 that estimates for sums of arithmetic functions are an 
essential step in understanding arithmetic problems. One of the themes we 
wish to pursue is the following strange phenomenon: Arithmetic properties of 
integers, especially primes, can be deduced from analytic properties of func- 
tions. A serious instance is afforded by the Prime Number Theorem itself 
(see p. 3 for the ~ and o(z) notation). 


Theorem 8.1. [PRIME NUMBER THEOREM] Asymptotically, the number of 
primes is given by 


n(x) =|{pEP|p<a}|~— 


log x” 


The Prime Number Theorem is of major importance in number theory. The 
notes at the end of the chapter give references where proofs can be found, in- 
cluding elementary approaches (in this context “elementary” means “without 
recourse to complex analysis”, and not “easy” ). Around the beginning of the 
nineteenth century, Legendre published a conjecture equivalent to the Prime 
Number Theorem. Gauss also studied the values of 7(x) at a similar time and 
conjectured? that 
1 


— Al 
logt eat) 


zx 

(a) ~ Li(x) = i 

2 
The Prime Number Theorem was first proved in 1896 by two mathematicians 
independently — Hadamard and de la Vallée Poussin. Their proofs used the 
Riemann zeta function and they were able to give an estimate for the error 


' For all small values of 2, 7(x) < Li(a), and several prominent mathematicians 
conjectured that the inequality always holds. In 1914, Littlewood proved that the 
inequality reverses infinitely often. Amazingly, the smallest value of x where the 
inequality first reverses is still not known, although it is known to be below 10°”. 
This is a compelling instance of a situation where even enormous amounts of 
numerical evidence can be completely deceptive. 
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term in the formula, based upon an estimate for a zero-free region of the 
zeta function. We will have more to say about the zeros of the Riemann zeta 
function in Section 9.2.1. 

In this chapter, we will start by giving a far-reaching refinement of the 
integral test that quickly gives sharper estimates for some arithmetic func- 
tions. We then develop the algebra of arithmetic functions with respect to a 
natural notion of multiplication, Dirichlet convolution. Finally, we apply these 
results to show how the Riemann zeta function may be extended to the whole 
complex plane with a simple pole at 1. 


8.1 Euler’s Summation Formula 


The oe test used on p. 10 compares the sum SS orn f(n) with the inte- 


gral ft ) dx. Euler’s Summation Formula is a refinement of this tool that 
allows us . derive sharper asymptotic formulas. Recall that 
{t}=t— |e (3.2) 


denotes the fractional part of a real number t, where |t]| is the greatest integer 
smaller than or equal to t. 


Theorem 8.2. Let a < b be real numbers, and suppose that f is a complex- 
valued function defined on [a,b] with a continuous derivative on (a,b). Then 


b 
S> f(n y= fra (t) dt + i {t}f’(t) dt — f(b){b} + f(a){a}. (8.3) 
a<ncb & 


PROOF. We give the proof in the case a,b € N for simplicity. 
Suppose that a<n-—1l<n<b. Now 


b 
D f(n) = bf(b ) —af(a) — f [i]f"(t) dt. (8.4) 
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On the other hand, integrating by parts, 


| F(t) dt = [tf (t)]° — ‘: tf/(t) dt. (8.5) 


a 


Equations (8.4) and (8.5) together give 


b 
Ios ; *) dt + yi ae) dt, 


n=at+l1 


completing the proof. 


Applying the Euler Summation Formula to the harmonic series ae i 
already gives a nontrivial result. Here a = 1, b = N, and f(t) = 1/t. This gives 


N N N N 

1 1 t 
yi-/ ae 1 de Sloe N= LO ay, (8.6) 
page a: i? n 


Clearly 


N lee) lee) 
t t t 
Gan f Qa [Ba 
i. oe 1 ¢ n t 
and the last term is less than [ ee z dt = x: Adding 1 to each side of Equa- 


tion (8.6) gives the following result, which should be compared with Exer- 
cise 1.2 to appreciate the power of the Euler Summation Formula. 


Theorem 8.3. 


Pal 1 
Yo + = tg +7 +0(z), where 
are N 

* {t} 

1 


is the Euler—Mascheroni constant. 
Definition 8.4. For1 <neéN, let d(n) denote the number of divisors of n. 


For example, d(n) = 2 if and only if n is a prime. It follows that information 
about d reflects something about the distribution of the primes themselves. 


Exercise 8.1. Prove that d(n) is odd if and only if n is a square. 


Theorem 8.5. 


N 
d(n) = Nlog N + (2y-1)N + O(VN). 


n=1 
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PROooF. The Euler Summation Formula in the usual form with integer bound- 

aries gives a much larger remainder term of the form O(N), which swamps 

the (2y—1)N term. For the sharper result with a O(VN) error term, we apply 

the Euler Summation Formula in the more general form given in Theorem 8.2. 
Notice that 


[{(m,q): mq < x}| =2|{(m,q): mq< a, m<q}|+O(Vz). (8.7) 


It follows that 


Samy Ys 


n<Kax M<a qKa/m 


25° 1+0(/2) 


mqcgu,m<q 


2 ([Z]-n)rowa 
m<JE 


2x 5° ~ 2 m+ O(/2). 


m<Je m<JE 


I 


I 


Now we can apply the Euler Summation Formula to each sum to obtain 


Y d(n) = 2a(log Vz +7 + O(a") — 2 (F + O(vz2)) + O(v2) 


n<Kxr 


= zlogx + (2y—1)4 + O(Vz) 


as required. 


The sharper form of the Euler Summation Formula used in the proof of 
Theorem 8.5 does not always give more precise results. For example, Theo- 
rem 8.2 applied to the harmonic series gives 


1 1 
S- + =togn+>+0(+)+ 
nm x x 


l<n<ca 


the last summand is O(4) and so goes into the remainder term. In this case, 
the general form does not give us more information, and for many summands 
this will be the case. 


Exercise 8.2. Prove Equation (8.7). (Hint: If mg < x and m < q, then m < 
/x. The number of q in this set for fixed m is [x/m] — m; draw a picture to 
see why.) 


Another application of the Euler Summation Formula gives Stirling’s For- 
mula. 
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Theorem 8.6. [STIRLING’S FORMULA] 


log N! = Nlog N — N + O(log N). 


PROOF. Clearly log N! = pele logn. Put f(t) = logt, and then by the Euler 
Summation Formula 


log vt = f logt dt-+ | > dt = NlogN — N + O(log N). 
1 1 


Stirling’s Formula and its refinements are extremely important in many 
parts of mathematics. A refinement of the Euler Summation Formula using 
derivatives of higher order gives the more precise estimate 


Vaan te” < nl < Joan tee tt l/lan (8.8) 
Exercise 8.3. *Prove the inequality (8.8). 


Exercise 8.4. Use the Euler Summation Formula to prove the following 
asymptotic formulas, where A and B are constants. 


“Slogn _ 1) 2 a log N 
(a) S° _~ = slog nN) + A+O[-— |. 


N 
1 1 
b — =loglogN+B ——— }. 
( ) ican Boca cr +Onpov) 


n= 


Exercise 8.5. Prove that 
N 
ie 5 (log NY? + 2ylog N + O(1), 


where ¥ is the Euler-Mascheroni constant. 


8.2 Multiplicative Arithmetic Functions 


Recall from Definition 3.4 that an arithmetic function f is multiplicative if 
f(m)f(n) = f(m)f(m) when gcd(m,n) = 1. 


It turns out that many functions of interest in number theory have this prop- 
erty. 


Definition 8.7. The Mobius function py is defined by u(1) = 1 and 


we (—1)* if n is a product of k distinct primes, and 
ee 0 otherwise. 
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Exercise 8.6. By Fermat’s Little Theorem (Theorem 1.12), for any prime p 
and integer a, a? — a = 0 modulo p. The Mobius function gives a natural 
generalization to a composite modulus. 

(a) For any a,n € N, prove that 


So u(n/d)at =0 (mod n). 
dl 


(b)*For any a,k,n € N, prove that 


S-u(n/dja” =0 (mod n). 
dl 


Remarkably, the Prime Number Theorem is also intimately connected with 
growth properties of the Mobius function, and in fact 


Prime Number Theorem <= S- p(n) = o(2). (8.9) 


n<xx 


That is, the Prime Number Theorem follows from and implies the fact 
that +>, <x H() — 0. Similarly, the important Riemann Hypothesis (Con- 
jecture 9.7 in the next chapter) is equivalent to a conjectured result about 


partial sums of the Mobius function. 


Exercise 8.7. Prove Tchebychef’s weak form of the Prime Number Theorem 
by the following steps. 
(a) Let N be an integer and p a prime. Show that the largest power of p 
dividing N! is exactly 


(b) Use (a) to show that 


wo (UES) 


pen P 


(c) Use Stirling’s Formula (Theorem 8.6) to show that 


l 
w > 28? = log N + O(N). 
pen P 


(d) Deduce that there exist constants A,B > 0 with the property that 


N 
m(N) < B— for all N 22. 


log N = log N 
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Exercise 8.8. Let 0 < a < 6 be fixed real numbers and assume the Prime 
Number Theorem. 

(a) Prove that ee > § as © —> 00. 

(b) Deduce that there is some X such that for any x > X there is a prime p 
with ax < p < ba. 


(c) Deduce that there is a rational 2 a< z <b, with p and q both prime. 


The Mobius function appears in a striking prime formula due to Gandhi. 


Exercise 8.9. *Let P, denote the product of the first n primes. Prove that 
the next prime p,4+1 is the unique integer m with the property that 


lei 2™ 


Theorem 8.8. The Mobius function is multiplicative. Moreover, 
_ fil ifn=1, 
2 ee { 0 otherwise. 


PRooF. Let m and n be integers with gcd(m, n) = 1, and factorize m and n as 
products of prime powers. The primes involved must all be distinct. If, in the 
factorization there is an exponent of 2 or more, then we are done since both 
sides of the equation p(mn) = u(m)u(n) are zero. If m (and n) is a product 
of k (resp. @) distinct primes, then mn is a product of k + @ primes, all of 
which are distinct since m and n are coprime. It follows that j(m) = (—1)* 
and p(n) = (—1)', and so 


p(mn) = (-1)** = p(m)u(n). 


For the next claim, it is sufficient to check the prime power case since again 
both sides of the equation of Theorem 8.8 are multiplicative (by the argument 
used in the proof of Theorem 3.9). If n = p” with r > 1, the left-hand side is 


w(1) + wp) =1-1=0, 


which completes the proof. 


Theorem 8.9. For all integers n > => ul pu(d 


ee Ly ld), 


164 8 The Riemann Zeta Function 


8.3 Dirichlet Convolution 


The proof of Theorem 8.9 is a special instance of a general technique. 


Definition 8.10. The convolution of arithmetic functions f and g is the func- 


tion f * g defined by 
(f *« g)( = 240 e (= ) 


Theorem 8.11. Convolution is commutative and associative. In other words, 
fxg=gxf and (f*g)*h=f *(g*h) 
for any arithmetic functions f, g, and h. 


PRooF. The sum in 


Has (") 


d|n 

runs over all pairs d,e € N with de = n, so it is equal to 
d= fale) 
de=n 


and the latter expression is symmetric in f and g. 

To see that convolution is associative, check the property for n = pa prime 
by hand. The proof in the general case goes in much the same way as the proof 
of commutativity: 


((f * g) *h) (n) = (f « (g *h)) ( = S> f(og 


cde=n 


from which associativity is clear. 


Lemma 8.12. Define the arithmetic function I by (1) = 1 and I(n) = 0 for 
alln >1. Then, for any arithmetic function f, 


fxl=I-«f =f. 


PROOF. 


(f x1) (n) = SF) )1(5) = f(n) I(1) = f(n) 


d|n 


since all the other summands are zero by the definition of I. 


Theorem 8.13. If f is an arithmetic function with f(1) 4 0, then there is a 
unique arithmetic function g such that f *g =I. This function is denoted f~}. 
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PRooF. The equation (f « g)(1) = f(1)g(1) determines g(1). Then define g 
recursively as follows. Assuming that g(1),...,g(m — 1) have been defined 
uniquely, the equation 


(f * g)(m) = f(1)g(n) + De f(d)g G) 


1l<d|n 


allows us to calculate g(m) uniquely. 


Example 8.14. Let u(n) = 1 for all n. Then, by Theorem 8.8, 
ut=yp. (8.10) 


Exercise 8.10. Let f be a multiplicative arithmetic function with f(1) 4 0. 
(a) Prove that f~!(n) = p(n)f(n) for all square-free n. 
(b) Prove that f~1(p?) = f(p)? — f(p?) for all primes p. 


Exercise 8.11. Let f be an arithmetic function, and consider the (formal) 


relationship 
[[G- 2)" = 5° R(n)2”. (8.11) 
n=1 n=0 
ae 


(a) Prove that R(n) = —2 S0"_,(f *u)(a) - R(n — a) for alln > 1. 

(b) Assume that R(0) = 1. Prove that f is uniquely determined by Equa- 
tion (8.11). 

(c) For f(n) = n®, prove that R(n) = —4 7?_,(n®+1)-R(n—a) for all n > 1. 


Exercise 8.12. If f is multiplicative, prove that f is completely multiplicative 
if and only if f~'(p*) = 0 for all primes p and a > 2. 


Exercise 8.13. Define an arithmetic function v(n) to be 1 when n = 0 and 
the number of distinct prime factors of n for n > 1. Let f = wx v. Prove 
that f(n) € {0,1} for alln EN. 


Exercise 8.14. (a) Prove that the collection of all arithmetic functions f 
with f(1) 4 0 forms an Abelian group under Dirichlet convolution. 

(b) Prove that the multiplicative arithmetic functions form a subgroup. 

(c) Show by example that the completely multiplicative functions do not form 
a subgroup. 


Theorem 8.15. [MOBIUS INVERSION is Given arithmetic functions f 


and g, f = > g(d) ) if and only if g(n =>) (4 ). 


d\n d|n 


Proor. Assume that f(n) = )7q),g(d), and let u(n) = 1 for all n as in 
Example 8.14. Then f = g * u. Convolve both sides of f = g * u with p and 
use Equation (8.10) to see that 
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fxeu=geuxu=—gel—=g, 


-S1ian("). 


d|n 


so 


For the converse, convolve g = f * with u. 


Thus Theorem 3.9 and Theorem 8.9 are equivalent: We can move from 
one to the other by convolving with the Mobius function or its convolution 
inverse. 


Exercise 8.15. Suppose o denotes a real number for which 


One 


n=1 
are absolutely convergent series. Prove that 
<> (f * g)(n) 
F(0)-G(o) =} “8 
n=1 
Example 8.16. If f * g =I, then F(c)G(o) = 1, so 


Series such as F, G, and F-G are called Dirichlet series. We next study the 
Riemann zeta function in the context of Dirichlet series. 


Exercise 8.16. For all s € C with R(s) > 2, show that 


The traditional notation for the variable s in Definition 1.4 of the Riemann 
zeta function is 
s=oartit with o,t ER. 


For s with real part o = R(s) > 1, we claim that the series )>~~ 
absolutely. To prove this, notice that 


a =o converges 


nt=n-? itt _y ce it logn 


has modulus n~? and that °° is a convergent series by the integral test. 


hye 
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8.3.1 Application of M6bius Inversion to Zsigmondy’s Theorem 


Before showing how Theorem 8.15 can be used to prove Zsigmondy’s Theorem 
(Theorem 1.15), a preliminary observation needs to be made. The polyno- 
mial x” — 1 already has some natural factorization according to the divisors 
of n. If d > 1 denotes any integer, let gq denote the monic polynomial whose 
zeros are the primitive? dth roots of unity. The polynomial ¢g is known as 
the dth cyclotomic polynomial. 

A simple application of Galois theory says that 


ga(x) € Z[x] for every d > 1. 


If you are not familiar with Galois theory we ask you take this on trust. A 
natural factorization of x” — 1 into integral polynomials follows at once, by 
dividing the nth roots of unity into the dth primitive roots of unity for d 
running over the divisors of n, 


a” —-1=]] dala). (8.12) 


d|n 
Exercise 8.17. Compute the polynomials ¢@g for 1 << d< 15. 


The factorization given by Equation (8.12) into integral polynomials yields 
a partial factorization of M,, = 2” — 1 into integers, 


2”-1=]] da(2). (8.13) 


d|n 


The first thing to notice about Equation (8.13) is that, by definition, any 
primitive divisor of M,, must divide ¢,,(2). The proof of Theorem 1.15 proceeds 
by showing that any factor of ¢,(2) which is common to M, for some d < n 
must itself already divide n. Therefore, as soon as ¢,,(2) exceeds n, M,, is 
guaranteed to have a primitive divisor. 

We claim that for every n > 6, ¢,(2) > n. To prove this, note first that 
for alln > 1, 


(8.14) 


To prove this inequality, apply Mobius inversion to Equation (8.13) to see 
that the logarithm of the left-hand side of the inequality (8.14) is 


2 A complex number z is a primitive dth root of unity if z? = 1, but z° 4 1 for 
any e with 0 < e < d. Thus the primitive dth roots of unity are precisely the 
complex numbers 

emk/4 with ged(k,d) = 1. 


Thus there are exactly ¢(d) distinct primitive dth roots of unity, where ¢ denotes 
the Euler function defined on p. 61. 
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Y (n/a) log(2* - 1), 


d\n 


This is bounded below by ¢(n) log 2 — 1 using Theorem 8.9 and an easy esti- 
mate for the logarithm. 

The right-hand term of the inequality (8.14) can be estimated using Corol- 
lary 3.6 and the bound (1.4). We deduce that if ,(2) <n then 


logn < 2loglogn+C, (8.15) 


for a constant C > 0 which can be made explicit. Clearly, this inequality 
bounds n and so we have completed the proof that for large enough n, the 
logarithm of the right-hand side of the inequality (8.14) is greater than log n. 
This completes the proof of the original claim, albeit in an inexplicit way. 


Exercise 8.18. Find an explicit value for the constant C' in the inequal- 
ity (8.15) and use this to find an explicit bound for n. 


Applying the multiplicative form of Theorem 8.15 to Equation (8.12) yields 


n(x) = | [(e* — 10", (8.16) 
d|n 
so in particular 
on(2) = [Jat 1’. (8.17) 
d|n 


Thus, Equation (8.17) gives 
ordp($n(2)) = S~ p(n/d) ordy(Ma). (8.18) 


din 


Now suppose that p is a prime with p|M,, and p|Ma for some d, 1<d<n 
(so p is not a primitive divisor of M,,). Let do be the smallest value of d for 
which p|Ma. Now the sequence (M,,) has the strong divisibility property of 
Exercise 1.15(c) on p. 27, namely, 


gcd(My, Mm) = M, 


gcd(n,m) for all m,n. 


Thus we may assume that do|n. By Exercise 1.15(b), 
ord,(Mnd,) = ordp(Ma,) + ord,(n), (8.19) 


and ord,(Ma) = 0 unless d is a multiple of do. Use Equation (8.19) to 
write Equation (8.18) as 


ordy ($n(2)) = S > p(n/ddo) (ord,(Ma,) + ord,(d)) 
d|(n/do) 


= ord,(Ma,) S- p(n/ddo) + oo p(n/ddo) ord,(d). 


d|(n/do) d|(n/do) 
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Since n > do, the first term vanishes because 


SS" u(d) =0 forn>1 
dl 


by Theorem 8.8. The second term is bounded above by ord,(n). As shown 
above, ¢,(2) > n for all n > 6, which concludes the proof of Theorem 1.15 
on p. 27. 

The next exercise uses similar methods to solve a special case of a deep 
general result due to Bilu, Hanrot and Voutier. The result was first shown by 
Carmichael in 1913. 


Exercise 8.19. [CARMICHAEL] Let A and B be nonzero integers. The Lucas 
sequence associated with the pair (A, B) is the sequence (u,,) defined by 


a” — B” 
ae ee, De 


where a and @ are the roots of the equation 
g?—-Ac+B=0. 


Assume that a@ and £ are real, and prove the following theorem: The term uy, 
of the associated Lucas sequence has a primitive divisor for n 4 1,2,6, except 
when A = B = —1 and n= 12. (Hint: Let ¢ be a primitive nth root of unity, 


and write 
Qn= |] (@-¢*9). 


1<k<n, 
gcd(k,n)=1 


Show that u,, will have a primitive divisor if Q, is not too small relative to the 
size of n, and then show that the smallest values of Q, arise for A= 1, B = —1 
and A = 3,B =2.) 


8.4 Euler Products 


Recall from Chapter 1 that the Riemann zeta function has a decomposition 
as an Euler product. By Theorem 1.5, 


where, as before, the product over p means a product over all the primes, 
and o > 1. We will now show that this holds for all complex s with #(s) > 1. 
Since it is no more difficult, we prove a more general theorem. 
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Theorem 8.17. If f is a multiplicative arithmetic function, and S>>~_, f(n) 
converges absolutely, then 


Y#(n) = TF) + ln) + £0?) ++) 


Pp 


If, in addition, f is completely multiplicative, then 
> f(n) =|] es 
— cok nD) 


PROOF. Let 


P(x) = [[ FQ) +f) + fp’) +---). 


pKu 
Each factor is absolutely convergent by hypothesis, and there are a finite 
number of factors, so 


P(x)= SY) f(n), 


n€ A(x) 
where 


A(x) = {n€N] all prime factors of n are less than or equal to z}. 


Now consider the difference 


Yof(n) — YO Fim) x SO IM) < SO ifm). 


n€ A(x) n¢ A(«) n>x 


The last sum tends to zero as x tends to infinity because it is the tail of a 
convergent series. The identity for completely multiplicative functions follows 
from the general Euler product expansion because in this case each factor of 
the infinite product is a convergent geometric series. 


The nonvanishing condition of Remark 1.6 on p. 15 is automatically satis- 
fied in the setting of Theorem 8.17 for all completely multiplicative functions f 
since the limit of a nontrivial convergent geometric series cannot be zero. 

Much of the study of the Riemann zeta function involves complex analysis. 


Definition 8.18. Let S be an open subset of C. A function f : S > C is called 
complex differentiable or holomorphic on S if the limit 

lim f(z +h) — f(z) 

h-0 h 


exists and is finite for all z € S. If for all z € S, f equals its own Taylor series 
in a small neighborhood of z, 


oo g(r) 
f(z+h)= Ss (3) h” for small h, 


then f is called analytic on S. 


8.5 Uniform Convergence 171 


Recall that all functions holomorphic on S are analytic on S and vice versa, 
in contrast with the case of real functions of a real variable, where “analytic” 
is a strictly stronger condition than “differentiable infinitely often.” 

Our next goal is to show that the function s ++ ¢(s) is analytic on the 
half-plane #(s) > 1. It is tempting to argue as follows. Each of the individual 
functions s ++ n~* is analytic, so the convergent sum should be analytic with 
derivative S7°., —°8". Unfortunately, an infinite sum of analytic functions 


n 


might not be analytic — indeed it might not even be continuous. 


Example 8.19. Let 
Py 
(1+ a?)” 
for n > 1, and let f(x) = >>, f, (x). We can sum the f,, because they form 
a geometric progression, 


f(x) = 


2 2 


= £ x 
d fal) os ; 1 i o\N+1_ 
. 1+ x? tS Fane (1+2°) 


Now when xz ¥ 0 we can let N tend to infinity, the second term tends to 
zero, and the whole sum converges to f(a) = 1 + x?. However, f,(0) = 0 for 
all n > 1, so f(0) = S07, f,(0) = 0 also. Thus the limit function f is not even 
continuous, although all the summands f,, are analytic on a neighborhood of 0. 
The same phenomenon for complex z is seen in the region {z | | arg(z)| < 7/4}. 


One useful criterion to make sure that nothing goes wrong in manipulating 
series of functions is uniform convergence. 


8.5 Uniform Convergence 


Definition 8.20. Let S C C be a nonempty set. A sequence (F,) converges 
pointwise to F on S if, for every s € S, Fn(s) > F(s). The sequence (Fp) 
converges uniformly to F on S if for alle > 0 there exists N = N(e) such that 
for alln > N 

|F(s) — Fr(s)| <¢ for alls € S. 


The uniformity in this definition is that the number N is not allowed to 
depend on s. Many useful properties of the terms of a sequence of functions 
are inherited by the limit function if the convergence is uniform. 


Theorem 8.21. Suppose that the sequence of functions (F,) converges to F 
uniformly on S. If each Fp, is continuous on S, then F is continuous on S. 
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PROOF. Fix so € S and € > 0. Choose N such that, for all s € S, 


IF(s) — Fav(s)| < 5. (8.20) 


This is possible because the sequence of functions (F,,) is converging uniformly 


on S. Next, choose 6 > 0 such that, for all s € S with |s — s9| < 6, 


|Fiv(s) — F(so)| < > (8.21) 


Now we have set the stage: For all s € S with |s — so| < 6, we have 


|F(s) — F(so)| = |F(s) — Fiv(s) + Fiv(s) — Fiv(s0) + Fiv(so) — F(s0)| 
< |F(s) — Fr(s)| + |Fa(s) — Frv(so)| + |Fv(so) — F(so)| 
€ € € 
< 3 + 3 + 3 ot aN 
We have used the inequality (8.20) twice, in order to estimate the first and 
third terms, and the inequality (8.21) for the second term. This proves that F 
is continuous at each point so € S. 


Theorem 8.22. For every 6 > 0, the partial sums of the Riemann zeta func- 
tion converge uniformly on Si4g ={s EC: R(s) > 1+}. That is, 


el 
ae a) as N + co 
n=1 


uniformly on Si+s5. 


In particular, by Theorem 8.21, the Riemann zeta function is continuous 
on 


LU Sirs ={s EC: R(s) > 1} = $4. 


6>0 
The convergence is not uniform on the whole of $j. 
PROOF OF THEOREM 8.22. Notice that for any s € Si459, 


ae i 
ce) = s gal" Ss ne 
n=1 n=N+1 
. 2 nit 


Given any € > 0, this is less than ¢ when N is large, independently of s, 
showing uniform convergence. 
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Exercise 8.20. (see Section 6.1) Let LZ denote a lattice in C with an associ- 
ated Weierstrass g-function defined by 


eL(z) = 5 + S- {=F a} for z € L. (8.22) 


OACEL 


Prove that 


s+> {op a} > ox (z) 


OA4LEL, 
lél<n 


uniformly as n > oo on any compact subset of C\L. 


8.6 The Zeta Function Is Analytic 


The next consequence of uniform convergence is that it preserves the analyt- 
icity of complex functions. 


Theorem 8.23. Suppose S C C is open, and we have a function F: S—>C 
and a sequence of functions Fy : S — C converging to F uniformly on S. If 
each Fy is analytic, then F is analytic. 

Example 8.24. The function Fy(s) = Sy +. is analytic and converges uni- 
formly to ¢(s) on every S145, 6 > 0 as in Theorem 8.22, so by Theorem 8.23, 
the Riemann zeta function is analytic on S,15 for every 6 > 0 — in other 
words, ¢ is analytic on {s € C | R(s) > 1}. 


PROOF OF THEOREM 8.23. Given a fixed point a € S$, we have to prove that F 
is analytic on a neighborhood of a. We use complex analysis, in particular 
Cauchy’s formula. Let y be a closed simple curve, that is a finite join of smooth 
curves such that a € Int(y) and the closure Int(y) C $. Then Cauchy’s formula 
says that for any function f that is analytic on S', and for any b € Int(y), 


Oe [2 dz. (8.23) 


Omi 


Since S' is open, it contains a small disk around a. We will need the following 
result. 


Lemma 8.25. Suppose a sequence of continuous functions Gy : y > C con- 
verges uniformly on y to a function G: y > C. Then G is continuous and 


dim, [ Guts v= [ Gs) a 
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PROOF. The continuity of G follows from Theorem 8.21, so in particular G is 
integrable. Now 


[so as— f Gus) re 


7 
< length(y) max |G(s) — Gy(s)]. 
sey 


The last quantity tends to zero by the definition of uniform convergence. This 
completes the proof of the lemma. 


Exercise 8.21. (a) Prove Morera’s Theorem: If f is a continuous function 
on a domain D C C with the property that ib f(z) dz = 0 for every closed 
contour in D, then f is analytic on D. 

(b) Use this to give a different proof of Theorem 8.23. 


Now a magic wand can be waved to complete the proof of Theorem 8.23. 
By hypothesis, the functions Fy are analytic on S, so for all b € Int(y) CS, 
by Cauchy’s formula Equation (8.23), 


Fxy(b) = maf Fv(s) ds. 


Define Gy(s) = Fu(s) Then Gy converges to G(s) = Fs) uniformly on 7 since 


where C' = maxse+ oF This proves that for all 6 € Int(y) 


F(b) = lim Fy(b) = lim aa | Su) Pee F(s) 


N-+00 Noo 27 7 ~ yas b 


ds. (8.24) 


Here, we have applied Lemma 8.25 to interchange a uniform limit and the 
integral in the last step. Finally, recall that any function F satisfying Cauchy’s 
formula (Equation (8.24)) on Int(y) is analytic there; this may be seen as 
follows. For all 6 € Int(7), 


sees) 7 san fF) (sp =) ds 


= F(s) 
~ a. CIGD 
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since the hs cancel. In the last integral, the limit h — 0 may be taken, which 
gives the derivative. Strictly speaking, we should establish uniform conver- 
gence (as with the Gy above) to be able to apply Lemma 8.25 again, but this 
is straightforward. 


Corollary 8.26. For all s with R(s) > 1, 
d log n 
ds Co a oS ns” 


We will see later that many of the deeper properties of the Riemann zeta 
function and their consequences for number theory take place for complex 
values s = 0 + it with o < 1, and all we have done so far (including the 
definition of the Riemann zeta function) does not apply to such values. 


8.7 Analytic Continuation of the Zeta Function 


A very important idea from complex analysis is that of analytic continuation. 
Given a function f defined by a convergent power series on a disk D of positive 
radius, an analytic function defined on any domain containing D that coincides 
with f on D is called a continuation of f. 


Exercise 8.22. Prove the Uniqueness Theorem: Suppose that G is a domain 
in C, and f and g are differentiable functions on G with f(z) = g(z) for 
all z € S, where S CG has a limit point in G. Then f(z) = g(z) for all z EG. 


Example 8.27. Consider the function defined by the power series 
g(s)=l+st+s?t-:, 


which converges for |s| < 1. Then g can be continued to a function that is 
analytic on the whole of C except for a simple pole at s = 1. To see this, notice 
that for |s| <1, g(s) = =4. The latter expression is defined on C apart from 
a simple pole at s = 1 with residue —1. Of course, g is not defined by the 
series for |s| > 1. 


Theorem 8.28. The Riemann zeta function has an analytic continuation 
to the whole of the complex plane with the exception of a simple pole with 
residue 1 ats =1. 


We will present two different proofs of this: a standard proof that may 
be found in any of the books on the topic and an alternative proof no doubt 
known to the experts but not readily available in the literature. 

In Chapter 9, we will find a functional equation for the Riemann zeta 
function. Using this, Theorem 8.28 may be deduced from the weaker statement 
that there is a continuation to R(s) > 0. The first proof is of this weaker 
statement. 
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Theorem 8.29. The Riemann zeta function has an analytic continuation to 
the set {s € C | R(s) > 0} with the exception of a simple pole at s = 1 with 
residue 1. 


FIRST (STANDARD) PROOF OF THEOREM 8.29. This involves a careful use 
of the Euler Summation Formula — the care is needed because the formula as 
stated only applies to finite intervals. Assume first that #(s) > 1; then, by 
the Euler Summation Formula, 


1 Pe os N _ sft} 
Yael t a+ f sear tt (8.25) 


The first term is 


“255 Ty 
and since we suppose R(s) > 1, we get N'~* > 0 as N tends to infinity. The 
second term in Equation (8.25) also converges as N tends to infinity since 


sti |s| 
if yeti a < i gout Ht < 0. 


This shows that }>*_, [- eae ast} dt is absolutely convergent, so we are jus- 
tified in writing 


aN | 1 {t} 
J=1p Sl sf ats at. (8.26) 
n=2 


Lemma 8.30. The integral in Equation (8.26) represents an analytic function 
on the range R(s) > 0. 


PROOF. Write ss 
=> fals) 
n=1 


where f(s) = | ae A dt. We will prove that for any 6 > 0 


n 


(a) the series for I(s) converges uniformly on #(s) > 6 and 
(b) each f, is analytic on the range #(s) > 0. 


Then we may apply Theorem 8.23 to complete the proof that I(s) is analytic 
on #(s) > 0. As for (a), 


N ee) ee) 
I(s)— So f(3)/ =| D5 fal@ < SS fa(s)l 
n=1 n=N+1 n=N+1 
ae ale 

ge eee 
nti lt —F I+ 
(N +1)~? 
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and in absolute value this is smaller than Wwe so it tends to zero. The 
bound depends on 6 only, not on o or s, which proves the uniform convergence. 
For (b), consider the difference quotient 


fn(s-+h) —fn(s) _ 1 ee {t} € 1) a (8.27) 


h h an 
Use a first-order Taylor approximation for the exponential 
th =e hleet — 1 — hlogt + f(A, 2), 


where 
f(h,t) = O((hlogt)”). (8.28) 


Substituting into Equation (8.27) gives 


n+1 
* (fn(s +h) ~ f(s) - | au (~ tot + 7(tst)) dt. 


The left-hand side for small h should be close to the derivative of f,, and we 
can make an intelligent guess at what this derivative will be: 


1 n+1 {t} n+1 1 
7 (fn(s + h) — f(s) + i peri Bt dt) < f pyar lth, t)] at 


The right-hand side tends to zero as h + 0 by Equation (8.28). This completes 
the first proof of the analytic continuation of the zeta function to R(s) > 0. 


Lemma 8.30 gives the analytic continuation of ¢ to the half-plane R(s) > 0 
by Equation (8.26), apart from a simple pole at s = 1 with residue 1. 


A natural question is to ask why it was necessary to split the integral from 1 
to co into a sum of subintegrals. The reason is that the Taylor approximation 
in Equation (8.28) is only valid for bounded values of h log t. If we had tried to 
make a similar argument for the integral from 1 to oo, t would be unbounded 
and the quantity hlogt would be unbounded. By the splitting of the integral, 
we had only to consider t € [n,n + 1] for a fixed n at each stage. 

These are treacherous waters! We are catching a glimpse here of how quite 
reasonable questions about the Riemann zeta function turn out to involve 
potentially subtle analytic problems. The methods we have just used can be 
squeezed to give a little more. The second proof of Theorem 8.29 (given below) 
will give an analytic continuation to the whole plane. 


Exercise 8.23. (a) Show that 
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is analytic for R(s) > 0. 
(b) Show that A(s) = (1 — 54,) ¢(s), and deduce the analytic continuation 
of the Riemann zeta function to R(s) > 0. 
(c) Repeat (a) and find a suitable analog of (b) for 
1 1 1 1 1 
B(s) =1 Pa bie 
(s) os 28 35 4s 5s 65 * 


(d) Deduce from (c) that the only pole of ¢ in R(s) > 0 is at s =1. 


Exercise 8.24. Prove that the Laurent expansion of ¢ about s = 1 begins 


((s) = > +74 6(3), (8.29) 


where ¢@ denotes a function that is analytic at s = 1 and vanishes there. 


The next proof gives the full continuation to the whole complex plane 
(with the exception of the simple pole at s = 1). 


SECOND PROOF OF THEOREM 8.28. Consider 


a —1 1 
S dr= = A 
| " cael eee ay | 


We subtract this from ¢(s) to remove the pole at s = 1. For R(s) > 1, 


(3) - 2s, x? da 
-¥(F- fore az) 
- (1 fo | “)"ae), (8.30) 


The sums involved here converge absolutely in the region #(s) > 1. 
Now assume that we have continued the zeta function to the domain 


Ris) >1-K 


for some integer K > 0. We want to continue it further to R(s) > —K. To do 
this, put h = a/n and use a Taylor approximation for 


f,(h) = (1+ h)-* 


of order K. Recall that the Taylor polynomial of degree K for fs at h = 0 is 
defined by® 


3 If you only want the continuation to R(s) > 0, think of K = 1: The Taylor 
polynomial for (1 + 2/n)~* in this case is simply 1 — =, and the error term 
is O(n~?). Substitute Equation (8.31) into Equation (8.30), and you get a series 
convergent for R(s) > 0. 
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“6 (0) 
Ti,s,K(h) = SS zl h* (8.31) 
k=0 , 


We have to calculate higher derivatives of f; at h = 0. These are given by 
setting h = 0 in the relation 


Bs 4 84 re 8 
(py = DME +h eae 2)--(8+1) 


fs(h), 


which is easy to prove by induction on k. Since f, is analytic on the neighbor- 
hood of h = 0, we have an estimate for the error term of the form 


(ie h’) 


|fs(h) — Tr,s,«(h)| < als (8.32) 


(K +1)! 


for some h’ with |h’| < |h|. For bounded values of s, this is O(|h|* ++). Use the 
Taylor polynomial with h = x/n in Equation (8.30). We evaluate the inner 
integral first: 


Ve ays £6 (0) 1 
ford AUS ot + O aes} 


k=1 


Putting this into Equation (8.30) gives the nice identity 


K 
ne 5 See et ag 


(k+1)! 


= 7. Tax (=)-(2+=) de. (8.33) 


The last sum converges by the inequality (8.32), and for all s 4 0,—1,—2,... 
the values 


Co 


pe 


k=1 
n=1 


¢(s +1), ¢(s + 2),...,¢(s + kK —1) 


are all defined by hypothesis, giving the continuation of the zeta function to 
R(s) > —K. 


In the case s = —m = 0,—-1, —2,...,1 — K, one of the arguments of ¢ in the 
first summand of Equation (8.33) becomes 1, but this is no problem since the 
pole is cancelled out by the appropriate factor (s +m) in the coefficient. (The 
right-hand side of Equation (8.33) has a removable singularity there.) 
Thus, by induction, there is an analytic continuation of the zeta function 
to 
R(s) > —1,-2,..., 


in fact to the whole of the complex plane. 
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Exercise 8.25. Recall that a(x) denotes the number of primes less than or 
equal to x. Prove that if s=o+ it with o > 1, then 


log ¢(s) = 7. ay dx. 


To do this, convert the sum over all primes to a sum over all natural numbers 
using 
1 if n is prime, 


Rn) a(n 1) = . otherwise. 


The next exercise is an easy version of what has been done for the Riemann 
zeta function ¢(s) = >°-_, e-slogn, 


Exercise 8.26. Define a function F' by 


co 
8) = Se ee 
n=0 


a) Find the domain of convergence for this series. 
b) Prove that the series converges uniformly for #(s) > 6 for any fixed 6 > 0. 
) Using (b), find the derivative of F' by differentiating term by term. 
) Find a simple expression for F' by viewing it as a geometric progression 
and summing it. 
(e) Differentiate this closed form and expand the answer using the Binomial 
Theorem. Check that you get the series in (c) again. 
(f) Obtain the analytic continuation of F' to the whole complex plane. Describe 
the location and order of all the poles of F. 
(g) Compute F(—1) 


( 
( 
(c 
(d 


CO 
Exercise 8.28. Prove that ¢?(s) “Se 


Exercise 8.29. Prove that {S- = SS oo) for R(s) > 2. 


¢(s) ns 
n=1 
: 1 C7(2) 
Exercise 8.30. Prove that pa; Gan? = a) 


ged(m,n)=1 


NOTES TO CHAPTER 8: A treatment of the Prime Number Theorem at a level 
similar to ours may be found in Jameson’s book [81]; this book also includes Sel- 
berg’s elementary proof of the Prime Number Theorem [136] (not using analytic 
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methods). The quantity 6(n) = > 


in the elementary proof; indeed the statement — —> 1 as n— o is equivalent 


p<n logp from Lemma 1.8 plays a central role 
to the theorem. Erdés [52] also published an elementary proof of the Prime Num- 
ber Theorem; a careful account of the controversy is provided by Goldfeld [70]. An 
accessible proof of the important inequality (8.8) may be found in Spivak’s lovely 
book [145, p. 543]. Exercise 8.6(b) is a special case of a result due to Moss [111]. 
The implications (8.9) are in the book of Apostol [4, p. 91]. Tchebychef’s proof of 
Exercise 8.7 appeared originally in his paper [152] of 1852. Exercise 8.9 is due to 
Gandhi [65]; an accessible treatment is in a paper of Golomb [71]. Exercise 8.11(a) 
and (c) are taken from a paper of Brent [20]; Exercise 8.11(b) comes from a paper of 
Janichen [82]. The proof of Zsigmondy’s Theorem in Section 8.3.1 is adapted from 
a more general result of Schinzel [133]. A readable account of Exercise 8.19 appears 
in a paper of Yabuta [165]. For further reading on Section 8.7, consult Apostol [4] or 
Titchmarsh [153]. Edwards’ book [47] contains a translation of Riemann’s original 
paper [128]. Fourier analysis — and its more august cousin, harmonic analysis — plays 
a central role in number theory. For sophisticated accounts, see Tate’s thesis [150] 
or Weil’s book [158]. A more accessible account of this advanced material may be 
found in the book of Ramakrishnan and Valenza [121]. 
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The Functional Equation of the 
Riemann Zeta Function 


A functional equation is simply an identity involving functions. The trigono- 
metric identity sin?(x) + cos?(x) = 1 is one of the most familiar examples. In 
this chapter, a highly nontrivial functional equation satisfied by the Riemann 
zeta function is found, that allows the function to be extended to the whole 
complex plane (apart from the singularity at s = 1). 

The proof is difficult, and it might seem that we have strayed far from the 
arithmetic path that started with the Fundamental Theorem of Arithmetic. 
However, the proof relies crucially on some observations that arose because 
of the way mathematicians, particularly in the nineteenth century, thought 
about functions. If a function f : C > C has distinct zeros z1, z2,..., then it 
seems natural to “factorize” it, and hope that 


f(z) = (2- 2)(z— 22)---. 


Of course, convergence issues arise, and occasionally some careful doctoring 
is needed to make this idea precise and useful. 


9.1 The Gamma Function 


The Gamma function is one of many classical special functions. It is surpris- 
ing how the Gamma function helps us to understand properties of the zeta 
function and some other arithmetic problems. 


Definition 9.1. The Gamma function I" is defined by 
E(s)= i e'ts/ dt (9.1) 
0 


for any s € C with R(s) > 0. 


Exercise 9.1. Prove that the integral in Equation (9.1) exists for any s € C 
with R(s) > 0. 
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As with the Riemann zeta function, it is important to establish the analytic 
properties of the Gamma function. 


Exercise 9.2. Prove that I'(s) is an analytic function of s for R(s) > 0 by 
proving the following statements. 
(a) I'(s) = 0.5 Ta(s), where Iy(s) = f?** e~tt8-} dt. 
(b) For any fixed 6 > 0, eee In(s) + '(s) uniformly on {s € C | R(s) > d} 
as N + oo. 
(c) I, is analytic for any n > 0. 

All these steps are very similar to the argument for i Ae dt. 


Later on, another (better) proof that I is analytic will be given. 
Lemma 9.2. The Gamma function has the following properties. 
(1) For all s with R(s) > 0, 

I'(s+1)=sI(s). 
(2) For all integers N > 0, 
I(N+1)=N!. 


ProoF. The first relation is found by integrating, 


Co 


I(s+l)= | é *P dt = [-e*t*] 5° + | 6 Fab 
0 0 


The first term vanishes at t = 0 because #(s) > 0. 
The second statement follows from the first by induction together with the 
easy calculation that (1) = 1. 


Proposition 9.3. The Gamma function can be analytically continued to all 
of C, where it is analytic apart from simple poles at 0,—-1,—-2,... and so on. 


PRooF. By Lemma 9.2(1), we may write 
1 
I'(s) = —-I'(s +1). 
8 


The right-hand side is defined for R(s) > —1 apart from s = 0, where it has 
a simple pole with residue [’(1) = 1. Iterating this gives 


1 

I'(s) = ——~I(s+2). 9.2 

(8) = pret) (9.2) 

The right-hand side of Equation (9.2) is defined for R(s) > —2, apart from s = 
0,—1, where there are simple poles again. In this way, we can inductively 
continue the Gamma function to the whole plane, where it is analytic apart 
from simple poles at 0,—1,—2,.... 


Theorem 9.4. I'(s) £0 for all s € C. 
This will be proved in Section 9.6. 
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9.2 The Functional Equation 


Our goal throughout this chapter will be the proof of the following theorem. 


Theorem 9.5. [THE FUNCTIONAL EQUATION] Let 
F(s) =a*?r (=) c(s) 
for R(s) > 0. Then F satisfies the functional equation 
F(1 — s) = F(s). 


Corollary 9.6. The function F has an analytic continuation to the whole 
complez plane apart from poles at 1 and 0. The Riemann zeta function has 
an analytic continuation to the complex plane where it is analytic apart from 
a simple pole at s=1. The zeta function vanishes at negative even integers. 


PROOF. Expand Theorem 9.5 to give 


womalep (252) ca 9) =P (3) o(8), 
) 


¢(1-—s)= = ; (9.3) 
P(**) 
We know that I(s) has a simple pole at s = —m for m € N. Thus, for 
all mE N, 
¢(-2m) = 0 


since 1 — s = —2m if and only if s = 2m +1, ¢(s) 4 0 for R(s) > 1 by 
the Euler product expansion, and I’ # 0 everywhere. The case s = 1 is 
different: Here the right-hand side has a simple pole in the numerator, too 
(in ¢), cancelling the one in I’. Thus ¢(s) is analytic and nonzero at s = 0. 
By the functional equation, the values of F(s) for R(s) > 1/2 determine all 
of F. We found ¢(—2m) = 0 for all m € N, and there are no more zeros of ¢ 
with #(s) < 0 because I’ has no other poles by Equation (9.3). Also, ¢(s) 4 0 
for R(s) > 1 because of the Euler product expansion (see Remark 1.6). 


In the course of the proof, we found a set of special values of the zeta 
function at negative even integers. Later we will see that negative odd integers 
yield rational values of the zeta function (see Exercise 9.10 on p. 204). 


9.2.1 The Riemann Hypothesis 


Corollary 9.6 gives another proof that the Riemann zeta function can be 
continued to the whole plane, where it is analytic apart from a simple pole 
at s = 1. Moreover, any nontrivial zero of ¢ must lie in the critical strip defined 
by 0 < R(s) < 1. Riemann stated without proof the following conjecture. 
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Conjecture 9.7. [THE RIEMANN HyPoTuEsIs] All zeros of ¢ in the critical 
strip 0 < R(s) < 1 have R(s) = $. 


This is still an open problem, and its resolution is viewed as one of the out- 
standing open problems in mathematics. All the zeros found thus far (the 
first ten billion are known) lie on the line R(s) = 5, and they are all sim- 
ple. Figure 9.1 shows #(¢(4 + it)) for 0 < t < 60, which already shows the 
extraordinary subtlety and complexity of the zeta function along the critical 


line. 


30 
20 


10 


Figure 9.1. The graph of R(¢(4 + it)) for 0 < t < 60. 


Just as the Prime Number Theorem is equivalent to a statement about the 
partial sums of the Mobius function, the Riemann Hypothesis is equivalent to 
the statement that for every ¢ > 0 


Y u(r) = OY), 


nxax 


Growth properties of the Mobius function are very delicate, and the numerical 
evidence can be deceptive. A long-standing conjecture of Mertens, supported 
by a great deal of numerical evidence, was that 


S> u(n)| < v2; 


n<xx 


this was eventually disproved by Odlyzko and te Riele in 1985. 

It is reasonable to ask why certain problems, such as the Riemann Hypoth- 
esis, obtain legendary status. Certainly this one has attracted considerable 
folklore. David Hilbert is reputed to have said that if he were to be awoken in 
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a thousand years, the first question he would ask would be about the status 
of the Riemann Hypothesis. Many mathematicians believe it must be true, 
although some great figures have been sceptical. The explanation for its im- 
portance is multi-faceted. On the one hand, its statement has great beauty 
and simplicity, while the many unsuccessful attempts to resolve it have driven 
forward sophisticated methods in number theory. On the other hand — perhaps 
more germane to our study — the Riemann Hypothesis is intimately connected 
to the distribution of the primes. Many results in analytic number theory can 
be proved in stronger forms if the Riemann Hypothesis is assumed. Less ob- 
viously, but perhaps most importantly, the Riemann Hypothesis seems to lie 
at the heart of future developments in the area of overlap between number 
theory, geometry and analysis. Workers in this area sometimes need an al- 
most prophetic insight that can lead to layers of conjectures about how hard 
unsolved problems will eventually be cracked. Much of this has to do with 
functions that generalize the Riemann zeta function, called L-functions (we 
will encounter an L-function in the next chapter.) The Riemann Hypothesis 
seems to be a basic example of a whole series of results that will be needed 
to make progress in this area. Finally, in addition to its central role in num- 
ber theory, the Riemann Hypothesis is conjectured to relate to problems in 
physics — the zeros of the zeta function corresponding to the eigenvalues of an 
appropriate Hermitian operator. 

The Clay Mathematics Institute! has offered a million dollars for a proof 
of the Riemann Hypothesis. The prize is not on offer for a disproof, say by 
giving a counterexample. 


9.3 Fourier Analysis on Schwartz Spaces 


For the proof of the functional equation in Theorem 9.5, we will need some 
Fourier analysis. 


Definition 9.8. The Schwartz space S is the set of functions f : RR > C that 
are infinitely differentiable and whose derivatives f™ (including the function 
itself f© =f) all satisfy 


(1+ |x|)” #™ = O(1) (9.4) 
or allméEN. The bound in 1) may depend upon m and n. 
Il N. The bound in O y depend d 


Example 9.9. The Gaussian function f(a) = e-® isin S. 


* On May 24th 2000, the Clay Mathematics Institute established seven Millennium 
Prize Problems, each worth one million dollars, including the Riemann Hypothesis 
because “they are important classic questions that have resisted solution over the 
years.” 


188 9 The Functional Equation of the Riemann Zeta Function 


Notice that S is a complex vector space and that any function f € S is 
integrable, 


[fe dx 


just by taking n = 0 and m = 2 in Equation (9.4). 


CO foe) 1 
< < ——— 
<f i@lac<e f (i+ a2 dx < ox, 


Definition 9.10. For any function f € S, the Fourier transform of f is the 
function 


f(y) = / * Faye 229 de. 


—oco 


The integral exists for the same reason as before, 


Fwi< f Wl dz Zio: 


—oo 


and in fact f € S again since we may apply Equation (9.4) with m = n to get 
the bound for f™). 

Thus f — f is a linear map from S to S. It turns out that this 
map has a fixed point — a function equal to its Fourier transform. Recall 


that f° e°™ de = ya. 
Lemma 9.11. Jf f(y) = ent then f(y) = f(y). 


PROOF. 


The idea is to complete the square, 
—n(x? + Qiry) = —a[(x + iy)? + y’), 


so the Fourier transform of f is 


Let 


We know that /(0) = 1. What happens if y 4 0? Fix some large N and 
consider the following paths: 


1 = [-N,N], yo = [N,N + yil, 
3 =[N+ yi, -N + yi], ya = [-N + yi, —N]. 
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Put y= 74+72+73 +74 (a rectangle). Since e-™*” is an analytic function 
on the whole of the complex plane, we have, for any N > 0, 


2 
fe" dz = 0. 
7 


wz 


Now, as N — ov, the integral of e~"* over 7; tends to I(0) = 1, the integral 
over 73 tends to —I(y), and the integrals over y2 and y4 both tend to 0, 
as N — oo. This completes the proof of Lemma 9.11. 


Exercise 9.3. Prove that 4 e-* dz +0as Noo for any yER. 


9.4 Fourier Analysis of Periodic Functions 


Fourier analysis is more familiar in the setting of periodic functions. 
Definition 9.12. A function g:R—-— C is periodic with period 1 if 
g(x) = g(a +1) for allx ER. 


If g is periodic and piecewise continuous, then its kth Fourier coefficient is 
defined fork € Z by 


1 
Ck =| g(x)e 27k dar, 
0 
and its Fourier series is the function 
G(x) = 5 Anes 
keZ 


Lemma 9.13. If g is periodic and twice differentiable with continuous second 
derivative, then there exists a constant C' > 0, depending only upon g, such 
that 
Cc 
Icx| < ke 
for all k 4 0. 


PROOF. Integrate by parts: 


- —e~ 2mika g (7) 
— nik 


1 


1 (-27ikz ol 
ch , ee (2) ae 
0 


0 2rik 


Now the bracketed term vanishes because g is periodic. Integrate by parts 
again, so that k? appears in the denominator, and then bound the exponential 
by 1. Finally, put C = ie |g”| dx/(4n). 
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Theorem 9.14. Any function g that is periodic and differentiable infinitely 
often has a Fourier series expansion 


g(x) = pe cer 
keEZ 


that is uniformly convergent on R. 


PROOF. Let G be the Fourier series of g, and apply Lemma 9.13: 


G(a) — 3 cper™ke Le S- =, 


k=—n |k|>n 


where the last sum tends to zero independent of x since the constant C' de- 
pends only on g. This proves the convergence is uniform. 

The equality g(z) = G(x) is not so easy to prove. We first record a few 
lemmas that are of interest in their own right. 


Lemma 9.15. Consider the sequence of functions (DK) defined by 


K 
D(x) = Se eres for KEN, 
k=—K 


called the Dirichlet kernel. Then 


i Dx (a) dx = 1, (9.5) 
0 
Dg(e) = A) (9.6) 
and , # 
i: g(yt2)Dk(x) de= S> cye?™*, (9.7) 
0 k=—K 


where cy, are the Fourier coefficients of g as in Theorem 9.14. 


The functions Dx are useful because they concentrate at the origin and 
pick out the Fourier coefficients conveniently. The shape of Dx is illustrated 
in Figure 9.2, that shows the graph of Dj. 


PROOF OF LEMMA 9.15. Equation (9.5) follows from the fact that 


1 
| e2mike dr — 0 for all k £0. 
0 


Equation (9.6) is proved by induction on k& or directly by summation of a 
geometric progression. Equation (9.7) follows since 
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20 


lim cy =0. 
|k| 00 


191 


In the last step, we have used the fact that g and Dx are periodic functions 
and that Dx is an even function. At this point, we put in the definition of 
the Dx, interchange the integral and the sum, and extract a factor e?™!*Y 
from each summand, which gives the right-hand side of Equation (9.7). 


Lemma 9.16. [RIEMANN—-LEBESGUE LEMMA] Let g be a continuous periodic 
function, and let cy, be the kth Fourier coefficient of g. Then 
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PROOF OF LEMMA 9.16. Define for continuous complex-valued periodic func- 
tions u,v the inner product 


and the norm 
lull = V(u, u). 


Let uz (x) = e?7!** so that cy, = (g, ux). Using the linearity of the inner product 
and the orthogonality relations 


Ok ZZ, 
Cone a, 


we get 


K 


ge— S> (g,un)ux 


k=-K 


z K K 
= («- So (g,un)ure- do fui] 


k=—K k=—K 
K 
d= Ie, ve)? 
K 
K 


ke= 
=I? - do leu). 
k=-—K 


Since the left-hand side is nonnegative, the sum on the right-hand side must 
be bounded independently of K’, so the series )77°_,., |cx|? converges. In par- 
ticular, c, + 0 as |k| > co. 


An immediate consequence of the Riemann—Lebesgue Lemma is 
1 
Cht+c_,= ai f g(a) sin(kx) dx —> 0 as k > ov. (9.8) 
0 


Now we are ready to complete the proof of Theorem 9.14. By Lemma 9.15, 
the partial sums of the Fourier series are given by the left-hand side of Equa- 
tion (9.7). We manipulate this integral a little, using Equation (9.6): 


: _ f'? &y+2)-sly) . 
| g(yt2)DK(ax) dx = [.. ai sin((2K + 1)a) dx 


1/2 
+f ,8Pxla) dx. (9.9) 


The last integral in Equation (9.9) simply equals g(y) for all K by the property 
in Equation (9.5) of the Dirichlet kernel. For the first summand, observe that 
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g(y + x) ~ g(y) 
sin(72x) 


is a periodic continuous function for x € [—1/2,1/2] (the limit in « = 0 exists 
by lH6pital’s rule). By Equation (9.8), this implies that the first summand 
tends to zero as Kk tends to infinity. 


Theorem 9.17. [POISSON SUMMATION FORMULA] Suppose that f belongs to 
the Schwartz space S. Then 


S~ f(m) = S> f(m). 


meZ meZ 


PROOF. Let 
g(r) = >> f(z +m), 
meZ 
which is certainly convergent since f € S. Clearly, g is periodic. Moreover, g 
is differentiable infinitely often since 


erm) < Lem <¢ Y aaa 


|m|>N |m|>N |m|>N 


where the last series tends to zero for |x| bounded. Therefore the nth deriva- 
tives of the partial sums converge uniformly by periodicity for all n > 0. We 
cannot apply Theorem 8.23 since the functions f,, are not necessarily analytic. 
However, we can use Lemma 8.25 as follows. Let y be the real interval [1, x], 
let Gy be the Nth partial sum of the derivatives f/,, and use the fundamental 
theorem of calculus to see that 


7 [ Gy(t) dt = Gy (x) — Gn(0). 


The integral converges to g(x) — g(0) as N —> oo, and similarly for higher 
derivatives, using induction. It follows that g is n times differentiable and 
its nth derivative is the limit of that of the partial sums, so we may do Fourier 
analysis on g. 

Let c, be the kth Fourier coefficient of g. Then, by Theorem 9.14, 


Co 


a(z)= D> cer, = g(o)= S~ cy. (9.10) 


k=—0o k=—0o 


On the other hand, 


1 1 foe) 
Ch = | g(x)e27F* da = i. S- f(x + m)e 27" der 
0 0 


m>=—Cco 


oo 1 
- S- | f(a + m)e~27*? dy. 
0 


m=—Cco 
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This interchange of sum and integral is justified because the series for g con- 
verges uniformly by Lemma 8.25 again. Multiply each summand by a factor 
of e~27ikm — 1 and substitute x +m for x to find 


oo 1 oo 
Ch = S- | f(a + nee dx = ii 
0) =} 


m>=—Cco 


= f(k). 


f(x) e 2mika yp 


Now = we bes 
S> f(m)=g(0)= S> a= S> F(k) 
m=—oco k=—oo k=—0oo 


by Equation (9.10). This completes the proof of the Poisson Summation For- 
mula. 


9.5 The Theta Function 


Another classical special function we need is the theta function. This satisfies 
a surprising functional equation, which plays a key role in the proof of the 
functional equation for the zeta function. 


Theorem 9.18. For real y > 0, define the theta function by 


CO 


y= So en. 


Then 


g (=) = Vy Wy). 


PROOF. This relation is far from obvious and looks barely possible. The series 
defining @ converges uniformly in the range y > 6 for any fixed 6 > 0. Fix 
some real b > 0 and define, with f(y) =e~7¥ as in Lemma 9.11, 


fo(y) = f(by) = en 


Of course, f, is in the Schwartz space S, so we may apply the Poisson Sum- 
mation Formula (Theorem 9.17) to obtain 


S> f(r) = S> fe(n). (9.11) 


n=—Cco n=—Co 


Next, we need to compute fp(y): 
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f(y) = i fy(x)e 24 dx = ; f(br)e 224 dex. (9.12) 


Now put u = ba, so dx = idu. Thus, Equation (9.12) becomes 
Ee _ 1 = —2Qriut _ Le ¥y 
fo(y) = ; | f(u)e > du= 5° (2) ; 


—Co 


Apply Lemma 9.11 to this equation to see that 


ww) 


Put this result into Equation (9.11), and use the definition of f again to obtain 


Co 


Se we s. ent |b 


n=—Cco n=—Co 


Finally, put b = ,/y and the functional equation for @ emerges. 


We are now ready for the proof of the functional equation of the zeta 
function. 


PROOF OF THEOREM 9.5. We begin with 


rT (5) = ie e ty/2-1 dy = 1 et 8/2 : 
2 0 0 x 


so, in the domain #(s) > 1+, 


F(s) = 1~8/? DH nse *g9/? ee 
n=179 o 


Next, replace « by mn?y in the integral. This means a a ” and after some 
cancellation we get 
co «(00 
d 
F(s) =} Serra? AN (9.13) 
0 y 


The interchange of the integral and sum is permitted because the series for 
the zeta function converges uniformly on #(s) > 1+ 6 for any fixed 6 > 0. 
Define 


Split the integral in Equation (9.13) into0O << y<landl<y<wo, 


lore) : d 1 , d 
F()=f ve) 2+ f yey) ©. 
1 ¥ 0 y 
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In the second integral, change y to z = y~', so that it becomes an integral 
over the region oo > z > 1, and Tf = — Thus 


Fs) = [ ” yPely) a / © pg (at (9.14) 


z 


The Poisson Summation Formula (Theorem 9.17) gave us Theorem 9.18, 
which may be applied to give 


g(y') = we al a a aA 


VO VE = viaty) + 


Substituting this into Equation (9.14), 


s dy -s dy 
F(s)= few) 2+ fy etu) 
1 
sf —s/2(,1/2 dy 
ri aay (y -1) a 9.15 
241 y ( 
Let J denote the third integral in Equation (9.15). Then 
1—s)/2 y 3/2 oO 


—1 1 1 1 
=2 —-})}=2 —-}. wl 
(= -) ( =) 216) 
Lemma 9.19. For all z € C, the function 
PO pe OY 
a= f ve? 
1 y 
is analytic. 


Assuming this for the moment, we have, by Equations (9.15) and (9.16), 


ee eee 


so F(1 — s) = F(s), and F is analytic for all s € C apart from simple poles 
at s = 1 and s = 0, completing the proof of Theorem 9.5. 


All that remains is to prove the lemma. 
PROOF OF LEMMA 9.19. Write G(z) = S>°°_, Gn(z), where 


n=l <n 


n+1 
Ga(z) = / y'ely) dy. 
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We will prove that the G, are analytic functions on all of C and then use 
a uniform convergence argument. Consider the difference quotient for G,,(z) 
(exactly as in the standard proof of Theorem 8.29 for the analytic continuation 
of the zeta function on p. 176), 


1 n+1 z 1 n+1 
al y” (y” — 1) g(y) dy = a yg(y)(1 + hlogy + p(h, y) — 1) dy, 


where p(h, y) = O(h?) for bounded values of y. One may therefore divide by h 
and take the limit h - 0. 

Next, we prove that the partial sums of the G, converge uniformly on a 
suitable domain. Consider z in the half-plane R(z) < K for some fixed K. 
There 


[> vet au) < f° vel a (9.17) 
N N 
Now we estimate |g(y)|: 
co 4 co enty 
_ —1mn-y < Ty : 
l= Jet Ser = 


The denominator is clearly bounded below for 1 < y, so the right-hand side of 
the inequality (9.17) is finite for N = 1, say. As an immediate consequence, the 
integrals from N to infinity must tend to zero, and all this was independent 
of z. Now we may apply Theorem 8.23 and deduce that G is analytic on 
the half-plane R(z) < K. Since K was arbitrary, G is analytic on the whole 
complex plane. 


9.6 The Gamma Function Revisited 


We have seen that the zeta function and the Gamma function go together 
like Hardy and Wright. We need to know some additional properties of I 
(in particular that I'(s) # 0 for all s € C) in order to understand the zeta 
function better. 


Theorem 9.20. [WEIERSTRASS] Define a function f by 
co 8 
f = i IC ~) cg 
(s) = se I + ey ee 


where y is the Euler—Mascheroni constant. Then f is an analytic function on 
the whole of the complex plane, and it is zero at 0, -—1,—-2,—3,... only. 


This rather mysterious function turns out to satisfy f(s) = 1/I'(s), giving 
another formula for the Gamma function and incidentally proving that I"(s) 


198 9 The Functional Equation of the Riemann Zeta Function 


is nonzero for all s € C (Theorem 9.4). The argument may appear at first 
sight an infuriating piece of magic, but it appears more reasonable when 
thought of as a (functional) factorization. We know that I has simple poles 
at 0,—1,—-2,..., so 1/F’ must have zeros there. The most naive approach is 
to look for a factorization of 1/I’(s) in the form 


Cs(s+1)(s+2)---, 


but this expression clearly does not converge. Trying to correct the most 
obvious defect (that the terms do not converge to 1) would lead one to look 
for expressions such as 


Cs(1+s)(14+ s/2)(1+8/3)---, 
but this is still not convergent because ~~~, * is not convergent. What is 
needed is a factorization in which the terms converge to 1 fast enough to 
guarantee convergence of the infinite product. The argument below gives a 
quadratic rate of convergence. This kind of adjustment became a standard 
tool in nineteenth-century analytic number theory, and we will encounter it 
several times. 


PROOF OF THEOREM 9.20. Consider the function 


Co 


g(s) = er 2 flog (1 a =) a =|. (9.18) 


n= 


BR 


Each g,, is analytic except at —1,—-2,—3,.... We want to prove that the 
series Equation (9.18) converges uniformly on {s € C: |s| < K} for every 
fixed kK > 0. Choose N > 2K, so for all n > N, |s/n| < 1/2, and therefore 


be al) ral) 


Thus we can estimate g,,(s) for all these s and n by 


3 


Lys? lys 

< ay | bea aan | Jey 
Ie(Sl< S|) +3], 
ls? 1 Is? | 2K? 


< : 
~ n21—|s\/n ~~ n? n? 


This can be summed from n= N to give 


oo oo 2K2 


and the latter is arbitrarily small if N is large, as it is the tail end of a 
convergent series. Thus the series Equation (9.18) is a uniformly convergent 
sum of analytic functions g,(s) on |s| < K for any K. By Theorem 8.23, we 
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deduce that the limit g(s) is analytic for all s not equal to —1,—2,—3,.... 


The same holds for i 
g(s) _ (1 *) a) ; 
e I + - e 


After multiplying this by se?°, we see that f is an analytic function away 
from —1, —2,—3,.... It is clear that f has zeros at each of these points, so log f 
has a singularity there. Conversely, away from these obvious zeros, we have 
shown that log f is analytic, so f cannot be zero elsewhere. Finally, for some 
fixed m € N, consider the infinite product defining f without the factor cor- 
responding to n = m. The same estimates as above show that the logarithm 
of this is analytic at s = —m, so f is analytic at s = —m as well. 


Corollary 9.21. The zeros of f in Theorem 9.20 are all simple, and the func- 
tion 


is analytic on C apart from simple poles at 0,—1,—2,.... The function 1/f 
has no zeros at all (because f has no poles). 


Theorem 9.22. [EULER] For all s 4 0,—1,—2,..., 


1 s 7 il: 
jhe) Pre. it 
n n 
PROOF. We use the definition of the Euler—Mascheroni constant 7 (see Exer- 
cise 1.2 on p. 10 or Theorem 8.3). 


f(s) 


1 1 
Ss 


N 
— i s(1+1/2+--+1/m—logm) }; ( =) —s/n 
f(s) =s lim e Jim I 14 *\e 


m—->co 


=s lm eS tl/2t+e-+ /m—log m) II (1 +4 =) e73/n 
nm 


moo 
n=1 


=s lim m7 Il (1 of =), (9.19) 


n=1 
Now we pull a rabbit out of the hat: Write m as 
2 3 m—-1 m 
~ 1 2 m—-2 m-1 


= (1+7) (1+5)---(1+ 5), (9.20) 


where as usual an empty product (the case m = 1) is defined to be 1. Substi- 
tute this into Equation (9.19) and use the fact that for all s 
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lim (1+4)°=1 


m—-> co 


to see that Equation (9.19) becomes 


Now invert both sides, and the proof of Theorem 9.22 is complete. 


Corollary 9.23. For all s € C, 
1 , 1-2---(m—1)m* 


f(s) moo s(s+1):--(stm—l1) 
PROOF. By Theorem 9.22, for s £4 0,—1,—2,... 


i! {2 eg ay 
eee aes ile oe I eee 
Oe ( ll 7 


n=1 


= lim COS ea ie) 
moo § (1+4)-- (1+ 545) 

— lim yp E24 An 1) (eta) 
M400 § Gi -se04s) ee Tee) 


where we have just multiplied the numerator and denominator by 
2-3---(m—1). 


Now collect the integers in the numerator into one product and the other 


factors into a product 
m-1 1 Ss 
i (2) 
n 
n=1 


by the identity (9.20). This completes the proof of the corollary. 


Theorem 9.24. For all s such that R(s) > 0, 
1 


Fs) =r(s)= f et?” dt. 
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Thus we have three representations of the Gamma function — Definition 9.1 
and the ones given in Theorems 9.20 and 9.22. The ability to move between 
these different formulations will be very useful. 


PROOF OF THEOREM 9.24. For n € N, define 


Ete ae — Ay et 


Evaluate I,,(s) using integration by parts. Substitute ¢ = nr to give 
1 
In(s) = nf (l—7)"r°"! dr 
0 
syl nin 1 
=n a 7)" | | | (1—17)" 478 dr 
os 8 Jo 


_ nén(n — 1) fo q)r2 7st dr=.--- 
s(s +1) 0 


_ nin-(n—1)-(n—2)---2-1 
7 s(s+1)---(s +n) ; 


Now let n tend to infinity, and use Corollary 9.23, which shows that 


: ol 
Jim In(s) = fle)" 


To complete the proof of Theorem 9.24, we need to prove that 


lim I,(s) = I'(s). 


n—->oco 


This is plausible because 


t n 
lim (1 — ) Set (9.21) 
noo n 


for all t. (To prove this, just take logarithms, replace 1/n by h, and apply 
VH6pital’s rule.) However, to apply Equation (9.21) to our problem, an ex- 
change of limit and integral is required. We must therefore prove that 


n t n 
lim le“ - (1 - “) fat. (9.22) 
0 n 


n—->co 


Estimate the integrand in Equation (9.22) by 


me(esler 
nm 


We need the following estimate: 
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t n 

e (1 ) < 
n 

for all t € [0,n]. Assuming this, 


Le eu pee ? 
0 n 


which obviously tends to zero. (Note that the convergence is even uniform for 
bounded s, although we do not need this here.) 


t?e—t 


n 


(9.23) 


a ha I(o+2 
dt < =f eptigs eT) ) 
0 


nm n 


Exercise 9.4. Prove the inequality (9.23). 


Exercise 9.5. Using logarithmic differentiation on the representation of I” in 
Theorem 9.20, prove that 
I'(1)=-+%. (9.24) 


Corollary 9.25. For alls € C, s EN, 


T 


I(s)P1-s)= 


sin(7s) ” 


PrRooF. By Theorem 9.20, 


I'(s)I'(-s) = 5 Il (1 + yt: es/n II (1 =) e 
n=1 n=1 
lr s?\" 1 
~ 52 I (1 =) = ssin(7s) 


using the classical formula 


sin(rs) = 1s Il (1 = =) (9.25) 


n=1 


The corollary follows because —sI\(—s) = I'(1 — s). 


Equation (9.25) is another example of an analog of the Fundamental Theo- 
rem of Arithmetic in a function-theory context. We know that sin(7s) vanishes 
at each integer, so we might hope to factorize it in the form 


Of course, this does not converge, and attempting to get the terms to converge 
to 1 fast enough to guarantee convergence of the infinite product plausibly 
leads one to conjecture Equation (9.25). 
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Exercise 9.6. Prove the identity (9.25). 


Exercise 9.7. Justify the steps in the following argument. The Taylor expan- 
sion of the sine function gives 


sin(7s) = 7s — ue fee, (9.26) 


By Equation (9.25), this is equal to 


1 1 1 
1 2 fe ieeees |) Sept 
ns ( Ss (G+itg ) ) 
see, 3 1 
=17S—T7S Dane 


Comparing the coefficient of s? with that of Equation (9.26) gives 


“1 
ars 


Exercise 9.8. Prove that ¢(2k) is a rational multiple of 7?* for any k > 1. 


Much less is known about the values ¢(3),¢(5),.... Apéry proved in 1978 
that ¢(3) ¢ Q, and there are some very deep results on the algebraic indepen- 
dence of various values of ¢ at odd integers. 


Exercise 9.9. This exercise is a more explicit version of the previous one. 
(a) Replace s by iz in Equation (9.25) to deduce that 


sinh(mz) = 12 Il (1 a =) (9.27) 


n=1 


(b) Use logarithmic differentiation to prove 


TZ TZ > (—1)*+1 ok 
k=1 
(c) Deduce that 
g2k-1 B 
_ k_2k 2k 
¢(2k) = (—1)hn™* ( 7 ), (9.29) 


where B,, denotes the nth Bernoulli number defined by 


z re Byz™ 
re De a (9.30) 
n=1 
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Exercise 9.10. (a) Use Theorem 9.5 and Equation (9.29) to prove that ¢ 
takes rational values at negative odd integers. 

(b)Use Equation (9.30) to show that B, = 0 for odd integers n > 1. 
(c)Deduce that 


a Bn4t 
Sat casera (9.31) 


for all n > 0. 


The neatness of Equation (9.31) suggests there might be a more elegant 
way to prove it. Hurwitz found a beautiful proof using complex analysis. 


Exercise 9.11. Use the functional equation together with Equations (9.24) 
and (8.24) to prove that 


¢'(0) 
= log(27). 9.32 
oy = o8(2R) (9.32) 
Prove that ¢(0) = —4 and deduce the value of ¢’(0). 
= 1 
Exercise 9.12. *Prove that De: Gn + iF is a rational multiple of 7* for 


any k > 2. 


There are many deep results on the location and distribution of the zeros 
of the Riemann zeta function, all far beyond our scope. 


Theorem 9.26. Define N(T) to be the number of zeros of the Riemann zeta 
function in the critical strip up to height T, 


N(T)=|l{sEC: 0< Ks) <1, C(s) =0, 0 < S(s) < THI. 
Then there is an asymptotic formula, 


T 
27 


N(T) = 5108 ( ) 5 + O(log), 


The proof makes use of Stirling’s Formula extended to the complex plane, 
1 
log I'(s) = -—s+ (s - 5) logs + O(1), 


provided |Arg(s)| < a — 0. 


Exercise 9.13. Define a function vy by v(1) = 0, and v(n) is the number of 
distinct prime divisors of n for n > 1. 


(a) Prove that yy vn) = ¢(s) S- = 


n=1 peP 

200) _ (8) 

P h a ; 
(b) Prove that = C(2s) 


n=1 
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At the start of this chapter, the idea of “factorizing” functions in the way 
that polynomials are factorized was discussed. Quite apart from the conver- 
gence issues that pervade this topic, infinite products may behave in quite 
surprising ways, as shown by the next exercise. 


Exercise 9.14. Using Exercise 8.11, show that, for any x with |z| < 1, 
ek = [[c = gy ae 
n=1 


The functional equations we have considered in this chapter are analytic 
properties of known classical functions. The next exercise is (relatively) light 
relief and is a functional equation in another sense: The unknown solution 
sought is a function. 


Exercise 9.15. *Find the solutions to the functional equation 


f(xz — w)F(@) Fly) + 8£(0) = 1+ 2f(0) F(0) + f(x) f(y) for all ay, 2 € R. 


Does the solution change if the identity is only required to hold for all z, y, z 
in Z? 


9.6.1 Factorizing the Riemann Zeta Function 


Several times in this chapter, we have seen a function factorize in a meaningful 
way into an infinite product of “irreducible” terms corresponding to zeros, 
corresponding to a function-theoretic version of the Fundamental Theorem of 
Arithmetic. The Riemann Hypothesis itself can be understood in these terms 
— except that the location of the zeros is not known. 


Theorem 9.27. [HADAMARD] Let = denote the set of zeros of the Riemann 
zeta function in the critical strip {z | 0 < R(z) < 1}. Then 


(9) = ea I (: a *) es, 
2(e—1) PS +1) E 


ges 


where b = log(27) —1+ 4. 


In this theorem, the zeros of the zeta function outside the critical strip are 
accounted for by the poles of (5 + 1). 


Exercise 9.16. Assuming the statement of Theorem 9.27 for some constant 0, 
show that it must have the stated value by using Exercise 9.11. 


NOTES TO CHAPTER 9: For a very interesting discussion of both the mathematics 
and the history of the type of analysis used in this chapter, and in particular to 
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gain some insight into how Euler came close to the functional equation, see Hardy’s 
monograph [74]. An elegant guide to classical Fourier analysis may be found in 
Katznelson’s book [87]. Apéry’s proof that ¢(3) is irrational appeared in his paper [3]; 
an accessible account is provided by van der Poorten [118]. More recent results on 
values of the zeta function at odd integers appear in works by Ball and Rivoal [9] or 
Rivoal [130] and references therein. The disproof of Merten’s conjecture mentioned 
on p. 186 appears in the paper of Odlyzko and te Riele [114]. A comprehensive 
guide to many of the analytic arguments here, including Exercises 9.4 and 9.6 is 
the classic text of Whittaker and Watson [160]. Artin’s book [6] is an exceptionally 
clear account of the main properties of the Gamma function. Deeper properties of 
the zeta function, emphasizing the role of Poisson summation, may be found in 
Patterson’s book [115]. Several different approaches to the functional equation for 
the Riemann zeta function appear in the book of Titchmarsh [153]. For a recent 
overview of the Riemann Hypothesis written by a worker in the field, consult the 
survey of Conrey [33]. Exercise 9.12 is classical; a proof requiring little background 
appears in a paper of Beukers, Kolk and Calabi [13] and is discussed in a paper of 
Elkies [50]. Exercise 9.14 is taken from a paper of Brent [19]. Exercise 9.15 is taken 
from a paper of Sunik [148]. 
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proof, 167 


